Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21927 )
Change subject: IMPALA-13445: Ignore num partition for unpartitioned writes ...................................................................... IMPALA-13445: Ignore num partition for unpartitioned writes When cost-based planning is used, writer parallelism is limited by the number of partitions. In the unpartitioned insert scenario, there will be just single partitions. That leads to a single fs writer only, which causes slow writes. This patch fixes the issue by distinguishing between partitioned insert and unpartitioned insert and cause following BEHAVIOR CHANGE if COMPUTE_PROCESSING_COST=1: 1. If the insert is unpartitioned, use the byte-based estimate fully. Shuffling should only happen if num writers is less than num input fragment instances. 2. If the insert is partitioned, try to plan at least one writer for each shuffling executor nodes, but do not exceed number of partitions. However, if number of partition is 1, try force writer colocation with input fragment. Both partitioned and unpartitioned insert still respect MAX_FS_WRITER option. This patch also does minor cleanup in DistributedPlanner.java. Testing: - In test_executor_groups.py, move insert tests from test_query_cpu_count_divisor_default into separate test_query_cpu_count_on_insert. Add some new insert test cases there. - Add and pass CardinalityTest.testByteBasedNumWriters(). - Add new planner tests under TpcdsCpuCostPlannerTest. - Pass test_executor_groups.py. Change-Id: I51ab8fc35a5489351a88d372b28642b35449acfc Reviewed-on: http://gerrit.cloudera.org:8080/21927 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/service/query-state-record.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M fe/src/test/java/org/apache/impala/planner/TpcdsCpuCostPlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test A testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-parquet.test M tests/custom_cluster/test_executor_groups.py 10 files changed, 2,572 insertions(+), 110 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21927 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I51ab8fc35a5489351a88d372b28642b35449acfc Gerrit-Change-Number: 21927 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
