Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11575 )

Change subject: IMPALA-6964: Track stats about column and page sizes in Parquet 
reader
......................................................................


Patch Set 2:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-parquet-scanner.h@661
PS4, Line 661:       bool* skip_row_group) WARN_UNUSED_RESULT;
Mention that these are called by the column readers?


http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-parquet-scanner.h@663
PS4, Line 663:
These should probably be int64_t for good measure


http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.h@354
PS4, Line 354:   friend class ScannerContext;
prefix with ///


http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.h@356
PS4, Line 356:
Make the byte counters int64_t? Also use a const ref?


http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.h@541
PS4, Line 541: g, i.e., applies runtime filters to file
_bytes_read_...


http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.h@554
PS4, Line 554:   /// Status::OK() and sets 'scan_range' to NULL when no more 
ranges are left to process.
nit: Map from SlotId to pair...


http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.h@557
PS4, Line 557: us Star
nit: typo


http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.h@558
PS4, Line 558:
misses trailing _


http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.cc@907
PS4, Line 907:     
unexpected_remote_bytes_->Set(reader_context_->unexpected_remote_bytes());
loop per const ref, you can use (const auto& bytes_read : ..) if you'd like


http://gerrit.cloudera.org:8080/#/c/11575/4/be/src/exec/hdfs-scan-node-base.cc@953
PS4, Line 953:
In C++ you can access a map using [key] and if the entry doesn't exist it will 
be created and the default initializer will be invoked, which for int is 0.


http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py@726
PS4, Line 726:
Shouldn't the [Un] be without [] here, i.e. we don't update the 
ParquetCompressed... one?


http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py@737
PS4, Line 737: nsions(cls):
assert that len(compressed_page_size_summaries) > 0 ?


http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py@738
PS4, Line 738:     super(TestScanRangeLengths, cls).add_test_dimensions()
for value in summary:  ?


http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py@768
PS4, Line 768: rcis
remove the []?


http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py@789
PS4, Line 789:     super(TestTextScanRangeLengths, cls).add_test_dimensions()
With named matches and groupdicts, this could be summary['max']

See https://docs.python.org/2/library/re.html#re.MatchObject.groupdict


http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py@798
PS4, Line 798:         vector.get_value('max_scan_range_length')
This seems to be the same in all tests. Move to the helper method?

You might want to use "[KM]?B" or "[BKM]+"?


http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py@811
PS4, Line 811: t
I don't understand what the "s" is doing


http://gerrit.cloudera.org:8080/#/c/11575/4/tests/query_test/test_scanners.py@813
PS4, Line 813:
If you convert this into a verbose regex with named fields it might become 
easier to use and the code becomes easier to read, too.

we have some similar code in parse_util.py, maybe you can move this to there, 
too?



--
To view, visit http://gerrit.cloudera.org:8080/11575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I322f9b324b6828df28e5caf79529085c43d7c817
Gerrit-Change-Number: 11575
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Comment-Date: Thu, 01 Nov 2018 22:25:28 +0000
Gerrit-HasComments: Yes

Reply via email to