Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/11575 )
Change subject: IMPALA-6964: Track stats about column and page sizes in Parquet reader ...................................................................... Patch Set 14: (12 comments) http://gerrit.cloudera.org:8080/#/c/11575/14/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/11575/14/be/src/exec/hdfs-scan-node-base.h@577 PS14, Line 577: std::unordered_map<SlotId, std::pair<std::atomic<int64_t>, std::atomic<int64_t>>> Use our own atomic implementations in common/atomic.h You could also consider using a proper struct instead of the pair to make its usage more readable. http://gerrit.cloudera.org:8080/#/c/11575/14/be/src/exec/parquet/hdfs-parquet-scanner.h File be/src/exec/parquet/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/11575/14/be/src/exec/parquet/hdfs-parquet-scanner.h@654 PS14, Line 654: Update nit: updates http://gerrit.cloudera.org:8080/#/c/11575/14/be/src/exec/parquet/hdfs-parquet-scanner.h@658 PS14, Line 658: Update nit: updates http://gerrit.cloudera.org:8080/#/c/11575/14/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/11575/14/tests/query_test/test_scanners.py@808 PS14, Line 808: def test_page_size_counters_uncompressed(self, vector): I think you could also pull all (un)compressed tests into one method, e.g. test_per_page_counters_(un)compressed(), but I don't feel strongly about it. http://gerrit.cloudera.org:8080/#/c/11575/14/tests/query_test/test_scanners.py@822 PS14, Line 822: assert summary['max'] == summary['min'] == summary['avg'] == summary['samples'] == 0 maybe assert all(summary[k] == 0 for k in summary) ? You could also create a helper method for this http://gerrit.cloudera.org:8080/#/c/11575/14/tests/query_test/test_scanners.py@832 PS14, Line 832: assert summary['max'] > 0 and summary['min'] > 0 and summary['avg'] > 0 and \ maybe assert all(...) here, too? http://gerrit.cloudera.org:8080/#/c/11575/14/tests/query_test/test_scanners.py@839 PS14, Line 839: + I suspect the + might not be needed here. http://gerrit.cloudera.org:8080/#/c/11575/14/tests/query_test/test_scanners.py@844 PS14, Line 844: get_bytes_summary_stats_counter(summary_name, runtime_profile) nit: indent 4, or wrap after opening parentheses (see PEP8, I think it says to prefer parentheses for multilines instead of \ when possible) http://gerrit.cloudera.org:8080/#/c/11575/14/tests/query_test/test_scanners.py@859 PS14, Line 859: "ParquetCompressedBytesReadPerColumn", runtime_profile) nit: indent 4, maybe check everywhere else http://gerrit.cloudera.org:8080/#/c/11575/14/tests/util/parse_util.py File tests/util/parse_util.py: http://gerrit.cloudera.org:8080/#/c/11575/14/tests/util/parse_util.py@152 PS14, Line 152: .*? This matches non-greedily until the opening (, right? You can also use [^\(] instead, might be slightly clear but I don't feel strongly. http://gerrit.cloudera.org:8080/#/c/11575/14/tests/util/parse_util.py@152 PS14, Line 152: [.]*[0-9]* Then this only needs to match the number in (), right? so we don't need the decimal and numbers after. http://gerrit.cloudera.org:8080/#/c/11575/14/tests/util/parse_util.py@155 PS14, Line 155: * Nit: here and elsewhere you likely mean "+" (1 or more) instead of "*" (zero or more) -- To view, visit http://gerrit.cloudera.org:8080/11575 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I322f9b324b6828df28e5caf79529085c43d7c817 Gerrit-Change-Number: 11575 Gerrit-PatchSet: 14 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Comment-Date: Wed, 12 Dec 2018 21:49:07 +0000 Gerrit-HasComments: Yes
