Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#15).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem
Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB
-1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0
-1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB
10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB
0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB
1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB
1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB
1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB
1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB
1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB
1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB
1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB
1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB
1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB
1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem
Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB
-1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0
-1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB
10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0
0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB
1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB
1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB
1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB
1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB
1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB
1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB
1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB
1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB
1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB
1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
35 files changed, 1,343 insertions(+), 611 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/15
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>