Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11575 )

Change subject: IMPALA-6964: Track stats about column and page sizes in Parquet 
reader
......................................................................


Patch Set 14:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/11575/14/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/11575/14/be/src/exec/hdfs-scan-node-base.h@577
PS14, Line 577:   std::unordered_map<SlotId, std::pair<std::atomic<int64_t>, 
std::atomic<int64_t>>>
Use our own atomic implementations in common/atomic.h

You could also consider using a proper struct instead of the pair to make its 
usage more readable.


http://gerrit.cloudera.org:8080/#/c/11575/14/be/src/exec/parquet/hdfs-parquet-scanner.h
File be/src/exec/parquet/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/11575/14/be/src/exec/parquet/hdfs-parquet-scanner.h@654
PS14, Line 654: Update
nit: updates


http://gerrit.cloudera.org:8080/#/c/11575/14/be/src/exec/parquet/hdfs-parquet-scanner.h@658
PS14, Line 658: Update
nit: updates


http://gerrit.cloudera.org:8080/#/c/11575/14/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/11575/14/tests/query_test/test_scanners.py@808
PS14, Line 808:   def test_page_size_counters_uncompressed(self, vector):
I think you could also pull all (un)compressed tests into one method, e.g. 
test_per_page_counters_(un)compressed(), but I don't feel strongly about it.


http://gerrit.cloudera.org:8080/#/c/11575/14/tests/query_test/test_scanners.py@822
PS14, Line 822:       assert summary['max'] == summary['min'] == summary['avg'] 
== summary['samples'] == 0
maybe

  assert all(summary[k] == 0 for k in summary)

?

You could also create a helper method for this


http://gerrit.cloudera.org:8080/#/c/11575/14/tests/query_test/test_scanners.py@832
PS14, Line 832:       assert summary['max'] > 0 and summary['min'] > 0 and 
summary['avg'] > 0 and \
maybe

  assert all(...)

here, too?


http://gerrit.cloudera.org:8080/#/c/11575/14/tests/query_test/test_scanners.py@839
PS14, Line 839: +
I suspect the + might not be needed here.


http://gerrit.cloudera.org:8080/#/c/11575/14/tests/query_test/test_scanners.py@844
PS14, Line 844:         get_bytes_summary_stats_counter(summary_name, 
runtime_profile)
nit: indent 4, or wrap after opening parentheses (see PEP8, I think it says to 
prefer parentheses for multilines instead of \ when possible)


http://gerrit.cloudera.org:8080/#/c/11575/14/tests/query_test/test_scanners.py@859
PS14, Line 859:       "ParquetCompressedBytesReadPerColumn", runtime_profile)
nit: indent 4, maybe check everywhere else


http://gerrit.cloudera.org:8080/#/c/11575/14/tests/util/parse_util.py
File tests/util/parse_util.py:

http://gerrit.cloudera.org:8080/#/c/11575/14/tests/util/parse_util.py@152
PS14, Line 152: .*?
This matches non-greedily until the opening (, right?

You can also use [^\(] instead, might be slightly clear but I don't feel 
strongly.


http://gerrit.cloudera.org:8080/#/c/11575/14/tests/util/parse_util.py@152
PS14, Line 152: [.]*[0-9]*
Then this only needs to match the number in (), right? so we don't need the 
decimal and numbers after.


http://gerrit.cloudera.org:8080/#/c/11575/14/tests/util/parse_util.py@155
PS14, Line 155: *
Nit: here and elsewhere you likely mean "+" (1 or more) instead of "*" (zero or 
more)



--
To view, visit http://gerrit.cloudera.org:8080/11575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I322f9b324b6828df28e5caf79529085c43d7c817
Gerrit-Change-Number: 11575
Gerrit-PatchSet: 14
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Comment-Date: Wed, 12 Dec 2018 21:49:07 +0000
Gerrit-HasComments: Yes

Reply via email to