Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12065 )

Change subject: IMPALA-5843: Use page index in Parquet files to skip pages
......................................................................


Patch Set 17:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/hdfs-parquet-scanner.cc@464
PS17, Line 464: bool GetRequiredStatsField(const string& fn_name,
> I think this could go into parquet-column-stats.{h,cc}.
Done


http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/hdfs-parquet-scanner.cc@546
PS17, Line 546:     ColumnStatsReader::StatsField stats_field = 
ColumnStatsReader::StatsField::MIN;
> I think we usually don't initialize output parameters to make it clear that
Hmm, I thought clang-tidy didn't like that but it must have been something else 
because it doesn't complain now.


http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/hdfs-parquet-scanner.cc@643
PS17, Line 643:       // We don't need the raw page index buffers anymore.
              :       page_index_.Release();
> Can this go to ProcessPageIndex? It already touches a bunch of other state.
ProcessPageIndex() has some RETURN_IF_ERROR macros and I wanted to be sure 
about calling Release().

However, I realised that I can just use a scope exit trigger in 
ProcessPageIndex().


http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/parquet-column-readers.h
File be/src/exec/parquet/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/parquet-column-readers.h@400
PS17, Line 400:   /// True, if we are using NextLevels() to readahead the next 
def and rep levels. In this
> I feel that this field needs more explanation. From just looking at the com
Elaborated the comment.


http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/parquet-column-readers.h@403
PS17, Line 403:   bool levels_readahead_ = false;
> Would it simplify the code to make this levels_read_ahead_offset_ (being -1
Since currently there are only two possibilities (-1 or 0), having a simple 
flag is more exact I think.

Also, we'd still need an if stmt when we have processed all the rows, because 
in this case we don't need to adjust the value of 'current_row_'.


http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/parquet-page-index-test.cc
File be/src/exec/parquet/parquet-page-index-test.cc:

http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/parquet-page-index-test.cc@58
PS17, Line 58: void ValidatePageIndexRange(const RowGroupRanges& 
row_group_ranges,
> Add a comment for this one, too?
Done


http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/parquet-page-index.h
File be/src/exec/parquet/parquet-page-index.h:

http://gerrit.cloudera.org:8080/#/c/12065/17/be/src/exec/parquet/parquet-page-index.h@56
PS17, Line 56: if there's at least parts of the page index are present
> nit: grammar
Rephrased the whole comment.


http://gerrit.cloudera.org:8080/#/c/12065/17/tests/query_test/test_parquet_stats.py
File tests/query_test/test_parquet_stats.py:

http://gerrit.cloudera.org:8080/#/c/12065/17/tests/query_test/test_parquet_stats.py@87
PS17, Line 87:     for batch_size in [0, 1]:
> Should we use a proper test dimension for the batch size, e.g. like in test
Yeah, I looked at that earlier, but I didn't want to run the other tests with 
those batch sizes, neither wanted to add if statements for each.

And I use different batch sizes here and at L97 which would be a bit more 
complicated with the other approach.

Also, it has the advantage to only load the data once.

On the other hand, I agree that this is not the cleanest solution, so I can 
change it if you feel strong about it.



--
To view, visit http://gerrit.cloudera.org:8080/12065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a
Gerrit-Change-Number: 12065
Gerrit-PatchSet: 17
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <pooja.nilange...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 May 2019 14:05:35 +0000
Gerrit-HasComments: Yes

Reply via email to