Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21927 )
Change subject: IMPALA-13445: Ignore num partition for unpartitioned writes ...................................................................... Patch Set 4: (4 comments) Thanks for adding the extra tests! http://gerrit.cloudera.org:8080/#/c/21927/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: http://gerrit.cloudera.org:8080/#/c/21927/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@446 PS4, Line 446: totalNumPartitions > 0 Should this be totalNumPartitions > 1 instead? As that is very similar to the unpartitioned case. http://gerrit.cloudera.org:8080/#/c/21927/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@450 PS4, Line 450: (int) nit: this cast is unnecessary http://gerrit.cloudera.org:8080/#/c/21927/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test: http://gerrit.cloudera.org:8080/#/c/21927/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test@122 PS4, Line 122: ss_sold_time_sk, : ss_item_sk, : ss_customer_sk, : ss_cdemo_sk, : ss_hdemo_sk, : ss_addr_sk, : ss_store_sk, : ss_promo_sk, : ss_ticket_number, : ss_quantity, : ss_wholesale_cost, : ss_list_price, : ss_sales_price, : ss_ext_discount_amt, : ss_ext_sales_price, : ss_ext_wholesale_cost, : ss_ext_list_price, : ss_ext_tax, : ss_coupon_amt, : ss_net_paid, : ss_net_paid_inc_tax, : ss_net_profit, : ss_sold_date_sk nit: could be * http://gerrit.cloudera.org:8080/#/c/21927/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test@311 PS4, Line 311: # Partition number is 1. : # There should be no shuffling. You could also have a test where many more records are written to a single partition, e.g.: create table store_sales_1_huge_part partitioned by (part_col) stored as iceberg as select a.*, 100000 as part_col from store_sales a, store_sales b; For me this also generates a very high number of instances for a plan fragment: F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=384 It's because in setupThresholdsForExecutorGroupSets() we use setNum_cores_per_executor(Integer.MAX_VALUE), so in analyzer.getMaxParallelismPerNode() we use max_fragment_instances_per_node=128 as a limit. Also there is a RANDOM shuffle before the writer even though the partition number is 1. -- To view, visit http://gerrit.cloudera.org:8080/21927 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I51ab8fc35a5489351a88d372b28642b35449acfc Gerrit-Change-Number: 21927 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 18 Oct 2024 12:54:28 +0000 Gerrit-HasComments: Yes
