Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21927 )
Change subject: IMPALA-13445: Ignore num partition for unpartitioned writes ...................................................................... Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/21927/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: http://gerrit.cloudera.org:8080/#/c/21927/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@446 PS4, Line 446: totalNumPartitions > 0 > Should this be totalNumPartitions > 1 instead? As that is very similar to t This was intentional to avoid writing many files into single partition. I don't mind making an exception, but what is the threshold? What if totalNumPartitions = 2 and there are many bytes to write? And I believe, having more than 1 writers will not work since only one of them will receive rows from shuffling. http://gerrit.cloudera.org:8080/#/c/21927/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test: http://gerrit.cloudera.org:8080/#/c/21927/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test@122 PS4, Line 122: ss_sold_time_sk, : ss_item_sk, : ss_customer_sk, : ss_cdemo_sk, : ss_hdemo_sk, : ss_addr_sk, : ss_store_sk, : ss_promo_sk, : ss_ticket_number, : ss_quantity, : ss_wholesale_cost, : ss_list_price, : ss_sales_price, : ss_ext_discount_amt, : ss_ext_sales_price, : ss_ext_wholesale_cost, : ss_ext_list_price, : ss_ext_tax, : ss_coupon_amt, : ss_net_paid, : ss_net_paid_inc_tax, : ss_net_profit, : ss_sold_date_sk > nit: could be * I think Planner insist that I list column names when I declare "partitioned by", but I'll double check. http://gerrit.cloudera.org:8080/#/c/21927/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test@311 PS4, Line 311: # Partition number is 1. : # There should be no shuffling. > You could also have a test where many more records are written to a single It should not go that high in this planner test because tpcdsParquetCpuCostQueryOptions declare .setMax_fragment_instances_per_node(12). But I will double check. Not sure why "100000 as part_col" is not recognized as 1 NDV, but having more than 1 writers will not work since only one of them will receive rows from shuffling. -- To view, visit http://gerrit.cloudera.org:8080/21927 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I51ab8fc35a5489351a88d372b28642b35449acfc Gerrit-Change-Number: 21927 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 18 Oct 2024 14:49:31 +0000 Gerrit-HasComments: Yes
