Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12519
Change subject: IMPALA-8214: Fix bad plan in load_nested.py ...................................................................... IMPALA-8214: Fix bad plan in load_nested.py The previous plan had the larger input on the build side of the join and did a broadcast join, which is very suboptimal. This speeds up data loading on my minicluster - 18s vs 31s and has a more significant impact on a real cluster, where queries execute much faster, the memory requirement is significantly reduced and the data loading can potentially be broken up into fewer chunks. I also considered computing stats on the table to let Impala generate the same plan, but this achieves the same goal more efficiently. Testing: Run core tests. Resource estimates in planner tests changed slightly because of the different distribution of data. Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb --- M testdata/bin/load_nested.py M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test 3 files changed, 12 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/12519/3 -- To view, visit http://gerrit.cloudera.org:8080/12519 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I55e0ca09590a90ba530efe4e8f8bf587dde3eeeb Gerrit-Change-Number: 12519 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
