Alex Behm has posted comments on this change. Change subject: IMPALA-5036: Parquet count star optimization ......................................................................
Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/6812/1//COMMIT_MSG Commit Message: PS1, Line 10: statistic > How about "we use the Parquet field RowGroup.num_rows"? Works for me. http://gerrit.cloudera.org:8080/#/c/6812/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 440: *dst_slot = file_metadata_.row_groups[row_group_idx_].num_rows; > There's also FileMetaData::num_rows. Can't we use that instead of looping o We could, but not sure it's worth it. One scanner does not necessarily process an entire Parquet file, so we'd need to make sure that exactly one scanner thread deals with the entire file just for this special case. Taras, maybe you can take a look and see how invasive that would be? Line 1455: // Column readers are not needed because we are not reading from any columns if this > Can we then optimize something like The transformation is only valid if l_comment is non-nullable. We have no concept of nullability for HDFS tables. http://gerrit.cloudera.org:8080/#/c/6812/1/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test File testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test: Line 34: | output: sum_zero_if_empty(functional_parquet.alltypes.parquet-stats: num_rows) > i don't know what this means. Personally, I prefer to show what is actually being executed in the explain plan. Otherwise, if something goes wrong it could be hard to debug because we do not know which code path it is taking. Do you have an alternative proposal for showing that the optimized path is being taken? How would we debug/support/test this feature? How will users understand their the query plan? Let's start a thread/doc about these usability/supportability issues. -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-HasComments: Yes