Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-4883: Union Codegen
......................................................................
Patch Set 5:
I reran the benchmark on patch 5 on a larger table where we select only 1
column:
SELECT
COUNT(c)
FROM (
select fnv_hash(ss_sold_time_sk) c from
tpcds_10_parquet.store_sales_unpartitioned_big
union all
select fnv_hash(ss_sold_time_sk) c from
tpcds_10_parquet.store_sales_unpartitioned_big
union all
select fnv_hash(ss_sold_time_sk) c from
tpcds_10_parquet.store_sales_unpartitioned_big
union all
select fnv_hash(ss_sold_time_sk) c from
tpcds_10_parquet.store_sales_unpartitioned_big
) t
Before: 17.6s
After: 9.98s
Not a huge difference. I think the bottleneck is scanning (not union), that's
why the improvement is not as big. Maybe the difference will be more
significant on a large cluster?
--
To view, visit http://gerrit.cloudera.org:8080/6459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: No