Alex Behm has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6812/1//COMMIT_MSG
Commit Message:

PS1, Line 10: statistic
> How about "we use the Parquet field RowGroup.num_rows"?
Works for me.


http://gerrit.cloudera.org:8080/#/c/6812/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 440:       *dst_slot = file_metadata_.row_groups[row_group_idx_].num_rows;
> There's also FileMetaData::num_rows. Can't we use that instead of looping o
We could, but not sure it's worth it. One scanner does not necessarily process 
an entire Parquet file, so we'd need to make sure that exactly one scanner 
thread deals with the entire file just for this special case. Taras, maybe you 
can take a look and see how invasive that would be?


Line 1455:     // Column readers are not needed because we are not reading from 
any columns if this
> Can we then optimize something like 
The transformation is only valid if l_comment is non-nullable. We have no 
concept of nullability for HDFS tables.


http://gerrit.cloudera.org:8080/#/c/6812/1/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test:

Line 34: |  output: 
sum_zero_if_empty(functional_parquet.alltypes.parquet-stats: num_rows)
> i don't know what this means.
Personally, I prefer to show what is actually being executed in the explain 
plan. Otherwise, if something goes wrong it could be hard to debug because we 
do not know which code path it is taking.

Do you have an alternative proposal for showing that the optimized path is 
being taken? How would we debug/support/test this feature? How will users 
understand their the query plan?

Let's start a thread/doc about these usability/supportability issues.


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to