Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4883: Union Codegen
......................................................................


Patch Set 5:

I reran the benchmark on patch 5 on a larger table where we select only 1 
column:
    SELECT
      COUNT(c)
    FROM (
      select fnv_hash(ss_sold_time_sk) c from 
tpcds_10_parquet.store_sales_unpartitioned_big
      union all
      select fnv_hash(ss_sold_time_sk) c from 
tpcds_10_parquet.store_sales_unpartitioned_big
      union all
      select fnv_hash(ss_sold_time_sk) c from 
tpcds_10_parquet.store_sales_unpartitioned_big
      union all
      select fnv_hash(ss_sold_time_sk) c from 
tpcds_10_parquet.store_sales_unpartitioned_big
    ) t

Before: 17.6s
After: 9.98s

Not a huge difference. I think the bottleneck is scanning (not union), that's 
why the improvement is not as big. Maybe the difference will be more 
significant on a large cluster?

-- 
To view, visit http://gerrit.cloudera.org:8080/6459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: No

Reply via email to