Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 )
Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder ...................................................................... Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/util/dict-encoding.h@412 PS3, Line 412: ConsumeBytes(sizeof(value)); > The parquet DictionaryPageHeader contains a num_values field. Look where we using num_values * size of type might not work when dealing with variable sized type like string. I would recommend keeping count of the bytes used in a local variable and then do a ConsumeBytes when we exit the loop. This is because mtrackp loops on all its parent trackers to update mem usage every time Consume(num_bytes) is called. Not a huge optimization but I think it might be worth avoiding another loop inside the hot path. http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/util/dict-test.cc File be/src/util/dict-test.cc: http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/util/dict-test.cc@39 PS3, Line 39: tracker can you add test cases that verify that the encoder/decoder is keeping track correctly. You can do this by using tracker.consumption() to get the num of bytes consumed and compare it to the expected size you calculate separately. -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 3 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Fri, 29 Sep 2017 01:16:40 +0000 Gerrit-HasComments: Yes
