Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5036: Parquet count star optimization ......................................................................
Patch Set 4: (11 comments) http://gerrit.cloudera.org:8080/#/c/6812/4/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 455: // There are no materialized slots, e.g. > There are no materialized slots and we are not optimizing count(*) Done Line 456: // "select count(*) from alltypes where int_col > 5" and "select 1 from alltypes". > The first query is not correct, in that case int_col is materialized. I thi Done http://gerrit.cloudera.org:8080/#/c/6812/4/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: Line 171: if (slot.getColumn() != null && !slot.getStats().hasStats()) return true; > Thanks. This was a bug in general, not really specific to your change. Agreed. http://gerrit.cloudera.org:8080/#/c/6812/4/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test File testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test: Line 105: from functional_parquet.alltypes > single line query Done Line 150: # The optimization is disabled if the output of the count(*) inline view is being > Optimization is not applied because the inner count(*) is not materialized. Done Line 285: # Optimization is not applied when selecting from an empty table. > Do we apply the optimization when we reference a non-empty partitioned tabl We do not. Added a test case for this. Line 323: # materialized. Not materialized agg exprs are ignored. > Non-materialized agg exprs are ignored. Done http://gerrit.cloudera.org:8080/#/c/6812/4/testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test File testdata/workloads/functional-query/queries/QueryTest/parquet-stats-agg.test: Line 86: # Verify that 0 is returned when we are selecting from an empty table and the optimization > the optimization is not applied in this case right? Yes, that's true, but it *could* be applied. Reworded slightly, still not sure if the phrasing makes sense now. Line 95: # Verify that 0 is returned when all the partitioned columns are filtered. > when all partitions are pruned Done http://gerrit.cloudera.org:8080/#/c/6812/4/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: Line 275: if (vector.get_value('table_format').file_format != 'text' or > seems weird, why not if we set it to parquet then this test will not run as part of core at all. It will only run as part of exhaustive. Is that what we want? Line 280: vector.get_value('exec_option')['batch_size'] = 1 > Doesn't exhaustive run this test with multiple batch sizes already? If so, No, I checked, batch size does not vary for this test in exhaustive -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com> Gerrit-HasComments: Yes