Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12065 )

Change subject: IMPALA-5843: Use page index in Parquet files to skip pages
......................................................................


Patch Set 15:

(6 comments)

I did a pass up to      be/src/exec/parquet/parquet-page-index-test.cc of the 
changes, thanks for all the fixes, it is definitely simpler to follow

http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/hdfs-parquet-scanner.cc@708
PS7, Line 708:   *is_null_page = column_index.null_pages[page_idx];
> Thanks for the tips!
Thanks for your patience with this, it is more readable now.


http://gerrit.cloudera.org:8080/#/c/12065/15/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12065/15/be/src/exec/parquet/hdfs-parquet-scanner.cc@1170
PS15, Line 1170:     // Scalar readers that read collections can be ahead of 
other readers by 1.
:(

This stuff got too complicated.


http://gerrit.cloudera.org:8080/#/c/12065/15/be/src/exec/parquet/parquet-column-readers.h
File be/src/exec/parquet/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/12065/15/be/src/exec/parquet/parquet-column-readers.h@533
PS15, Line 533: isn't anymore
pedantry: aren't any more


http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/parquet-column-readers.h
File be/src/exec/parquet/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/parquet-column-readers.h@529
PS7, Line 529:
> Added DCHECKs for 'current_row_range_' here and above.
Thanks, the new name is better than what I suggested.


http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/parquet-column-readers.cc
File be/src/exec/parquet/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/parquet-column-readers.cc@512
PS7, Line 512:       if (IsLastCandidateRange()) {
> Yeah, num_buffered_values_ == 0 is properly handled by NextPage(), and here
Yeah exactly, the bit-packed runs in Parquet's RLE encoding can have spurious 
0's at the end - it doesn't store the # of values in band.


http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/parquet-common.cc
File be/src/exec/parquet/parquet-common.cc:

http://gerrit.cloudera.org:8080/#/c/12065/7/be/src/exec/parquet/parquet-common.cc@96
PS7, Line 96:       // We found a gap in 'skip_ranges', i.e. a row range that 
is not covered by
> Added a DCHECK(std::is_sorted), this might be less efficient in debug build
Yeah that seems fine, although we avoided the problem anyway :). I think it's 
fine to have an O(n) DCHECK in an O(n) function.



--
To view, visit http://gerrit.cloudera.org:8080/12065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a
Gerrit-Change-Number: 12065
Gerrit-PatchSet: 15
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Pooja Nilangekar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Sat, 20 Apr 2019 01:12:41 +0000
Gerrit-HasComments: Yes

Reply via email to