Hello Abhishek Rawat, Zoltan Borok-Nagy, David Rorke, Wenzhe Zhou, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/21927

to look at the new patch set (#6).

Change subject: IMPALA-13445: Ignore num partition for unpartitioned writes
......................................................................

IMPALA-13445: Ignore num partition for unpartitioned writes

When cost-based planning is used, writer parallelism is limited by the
number of partitions. In the unpartitioned insert scenario, there will
be just single partitions. That leads to a single fs writer only, which
causes slow writes.

This patch fixes the issue by distinguishing between partitioned insert
and unpartitioned insert and cause following BEHAVIOR CHANGE if
COMPUTE_PROCESSING_COST=1:

1. If the insert is unpartitioned, use the byte-based estimate fully.
   Shuffling should only happen if num writers is less than num input
   fragment instances.
2. If the insert is partitioned, try to plan at least one writer for
   each shuffling executor nodes, but do not exceed number of
   partitions. However, if number of partition is 1, try force writer
   colocation with input fragment.

Both partitioned and unpartitioned insert still respect MAX_FS_WRITER
option. This patch also does minor cleanup in DistributedPlanner.java.

Testing:
- In test_executor_groups.py, move insert tests from
  test_query_cpu_count_divisor_default into separate
  test_query_cpu_count_on_insert. Add some new insert test cases there.
- Add and pass CardinalityTest.testByteBasedNumWriters().
- Add new planner tests under TpcdsCpuCostPlannerTest.
- Pass test_executor_groups.py.

Change-Id: I51ab8fc35a5489351a88d372b28642b35449acfc
---
M be/src/service/query-state-record.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/ddl.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-parquet.test
M tests/custom_cluster/test_executor_groups.py
10 files changed, 2,247 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/21927/6
--
To view, visit http://gerrit.cloudera.org:8080/21927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I51ab8fc35a5489351a88d372b28642b35449acfc
Gerrit-Change-Number: 21927
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

Reply via email to