Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12519


Change subject: IMPALA-8214: Fix bad plan in load_nested.py
......................................................................

IMPALA-8214: Fix bad plan in load_nested.py

The previous plan had the larger input on the build side of the join and
did a broadcast join, which is very suboptimal.

This speeds up data loading on my minicluster - 18s vs 31s and has a
more significant impact on a real cluster, where queries execute
much faster, the memory requirement is significantly reduced and
the data loading can potentially be broken up into fewer chunks.

I also considered computing stats on the table to let Impala generate
the same plan, but this achieves the same goal more efficiently.

Testing:
Run core tests. Resource estimates in planner tests changed slightly
because of the different distribution of data.

Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
---
M testdata/bin/load_nested.py
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
3 files changed, 12 insertions(+), 12 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/12519/3
--
To view, visit http://gerrit.cloudera.org:8080/12519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb
Gerrit-Change-Number: 12519
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>

Reply via email to