Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21860 )
Change subject: IMPALA-13405: Do tuple analysis to lower AggregationNode cardinality ...................................................................... Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/21860/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q98.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q98.test: http://gerrit.cloudera.org:8080/#/c/21860/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q98.test@48 PS4, Line 48: | tuple-ids=5 row-size=214B cardinality=108.00K cost=1050713 Why are the old and new values so much higher than with cpu cost disabled? http://gerrit.cloudera.org:8080/#/c/21860/4/testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test File testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test: http://gerrit.cloudera.org:8080/#/c/21860/4/testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test@384 PS4, Line 384: | row-size=40B cardinality=10 The actual output of this run is pretty weird F02:ROOT 1 1 271.000us 271.000us 4.01 MB 4.00 MB 13:MERGING-EXCHANGE 1 1 22.027us 22.027us 10 10 48.00 KB 16.00 KB UNPARTITIONED F01:EXCHANGE SENDER 3 3 31.969us 37.302us 3.94 KB 176.00 KB 10:TOP-N 3 3 138.518us 201.601us 30 10 16.00 KB 400.00 B 12:AGGREGATE 3 3 5.512ms 5.788ms 11.62K 10 34.04 MB 128.00 MB FINALIZE 11:EXCHANGE 3 3 38.321us 48.434us 11.62K 10 176.00 KB 16.00 KB HASH(o_orderkey,o_orderdate,o_shippriority) F00:EXCHANGE SENDER 3 3 811.854us 1.367ms 253.19 KB 528.00 KB 09:AGGREGATE 3 3 5.399ms 9.315ms 11.62K 10 44.05 MB 128.00 MB STREAMING 01:SUBPLAN 3 3 1.654ms 3.320ms 0 3.00M 10.00 MB 0 |--08:NESTED LOOP JOIN 3 3 3.806ms 7.477ms 0 100 32.00 KB 33.00 B CROSS JOIN | |--02:SINGULAR ROW SRC 3 3 0.000ns 0.000ns 0 1 0 0 | 04:SUBPLAN 3 3 3.018ms 5.988ms 0 100 8.00 KB 0 | |--07:NESTED LOOP JOIN 3 3 4.807ms 9.485ms 0 10 24.00 KB 36.00 B CROSS JOIN | | |--05:SINGULAR ROW SRC 3 3 0.000ns 0.000ns 0 1 0 0 | | 06:UNNEST 3 3 659.021us 1.316ms 0 10 0 0 o.o_lineitems l | 03:UNNEST 3 3 371.777us 737.710us 0 10 0 0 c.c_orders o 00:SCAN HDFS 3 3 212.737ms 298.593ms 8.64K 30.00K 93.27 MB 616.00 MB tpch_nested_parquet.customer c We get to the same result at the top, but we have a subplan that expects to take in 100 rows producing 3M, but in actuality both process 0 rows. Then we estimate 10 rows for 09:AGGREGATE but get 11.62K. The old and new estimates are both wildly wrong; I'm not sure why we end up underestimating now though. That seems like possibly a bug in this implementation. -- To view, visit http://gerrit.cloudera.org:8080/21860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd589ab5f7ba9566a0d35784f61f5ffaef5696e7 Gerrit-Change-Number: 21860 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Thu, 03 Oct 2024 17:43:49 +0000 Gerrit-HasComments: Yes
