Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/11575 )
Change subject: IMPALA-6964: Track stats about column and page sizes in Parquet reader ...................................................................... Patch Set 2: (18 comments) http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-parquet-scanner.h@661 PS4, Line 661: bool* skip_row_group) WARN_UNUSED_RESULT; Mention that these are called by the column readers? http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-parquet-scanner.h@663 PS4, Line 663: These should probably be int64_t for good measure http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.h@354 PS4, Line 354: friend class ScannerContext; prefix with /// http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.h@356 PS4, Line 356: Make the byte counters int64_t? Also use a const ref? http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.h@541 PS4, Line 541: g, i.e., applies runtime filters to file _bytes_read_... http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.h@554 PS4, Line 554: /// Status::OK() and sets 'scan_range' to NULL when no more ranges are left to process. nit: Map from SlotId to pair... http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.h@557 PS4, Line 557: us Star nit: typo http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.h@558 PS4, Line 558: misses trailing _ http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.cc@907 PS4, Line 907: unexpected_remote_bytes_->Set(reader_context_->unexpected_remote_bytes()); loop per const ref, you can use (const auto& bytes_read : ..) if you'd like http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.cc@953 PS4, Line 953: In C++ you can access a map using [key] and if the entry doesn't exist it will be created and the default initializer will be invoked, which for int is 0. http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py@726 PS4, Line 726: Shouldn't the [Un] be without [] here, i.e. we don't update the ParquetCompressed... one? http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py@737 PS4, Line 737: nsions(cls): assert that len(compressed_page_size_summaries) > 0 ? http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py@738 PS4, Line 738: super(TestScanRangeLengths, cls).add_test_dimensions() for value in summary: ? http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py@768 PS4, Line 768: rcis remove the []? http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py@789 PS4, Line 789: super(TestTextScanRangeLengths, cls).add_test_dimensions() With named matches and groupdicts, this could be summary['max'] See https://docs.python.org/2/library/re.html#re.MatchObject.groupdict http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py@798 PS4, Line 798: vector.get_value('max_scan_range_length') This seems to be the same in all tests. Move to the helper method? You might want to use "[KM]?B" or "[BKM]+"? http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py@811 PS4, Line 811: t I don't understand what the "s" is doing http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py@813 PS4, Line 813: If you convert this into a verbose regex with named fields it might become easier to use and the code becomes easier to read, too. we have some similar code in parse_util.py, maybe you can move this to there, too? -- To view, visit http://gerrit.cloudera.org:8080/11575 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I322f9b324b6828df28e5caf79529085c43d7c817 Gerrit-Change-Number: 11575 Gerrit-PatchSet: 2 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Comment-Date: Thu, 01 Nov 2018 22:25:28 +0000 Gerrit-HasComments: Yes
