Hello Abhishek Rawat, Zoltan Borok-Nagy, David Rorke, Wenzhe Zhou, Impala
Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/21927
to look at the new patch set (#7).
Change subject: IMPALA-13445: Ignore num partition for unpartitioned writes
......................................................................
IMPALA-13445: Ignore num partition for unpartitioned writes
When cost-based planning is used, writer parallelism is limited by the
number of partitions. In the unpartitioned insert scenario, there will
be just single partitions. That leads to a single fs writer only, which
causes slow writes.
This patch fixes the issue by distinguishing between partitioned insert
and unpartitioned insert and cause following BEHAVIOR CHANGE if
COMPUTE_PROCESSING_COST=1:
1. If the insert is unpartitioned, use the byte-based estimate fully.
Shuffling should only happen if num writers is less than num input
fragment instances.
2. If the insert is partitioned, try to plan at least one writer for
each shuffling executor nodes, but do not exceed number of
partitions. However, if number of partition is 1, try force writer
colocation with input fragment.
Both partitioned and unpartitioned insert still respect MAX_FS_WRITER
option. This patch also does minor cleanup in DistributedPlanner.java.
Testing:
- In test_executor_groups.py, move insert tests from
test_query_cpu_count_divisor_default into separate
test_query_cpu_count_on_insert. Add some new insert test cases there.
- Add and pass CardinalityTest.testByteBasedNumWriters().
- Add new planner tests under TpcdsCpuCostPlannerTest.
- Pass test_executor_groups.py.
Change-Id: I51ab8fc35a5489351a88d372b28642b35449acfc
---
M be/src/service/query-state-record.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java
A
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/ddl.test
A
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test
A
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-parquet.test
M tests/custom_cluster/test_executor_groups.py
10 files changed, 2,572 insertions(+), 110 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/21927/7
--
To view, visit http://gerrit.cloudera.org:8080/21927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I51ab8fc35a5489351a88d372b28642b35449acfc
Gerrit-Change-Number: 21927
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>