Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21927 )

Change subject: IMPALA-13445: Ignore num partition for unpartitioned writes
......................................................................


Patch Set 4:

(4 comments)

Thanks for adding the extra tests!

http://gerrit.cloudera.org:8080/#/c/21927/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/21927/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@446
PS4, Line 446: totalNumPartitions > 0
Should this be totalNumPartitions > 1 instead? As that is very similar to the 
unpartitioned case.


http://gerrit.cloudera.org:8080/#/c/21927/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@450
PS4, Line 450: (int)
nit: this cast is unnecessary


http://gerrit.cloudera.org:8080/#/c/21927/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test:

http://gerrit.cloudera.org:8080/#/c/21927/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test@122
PS4, Line 122: ss_sold_time_sk,
             : ss_item_sk,
             : ss_customer_sk,
             : ss_cdemo_sk,
             : ss_hdemo_sk,
             : ss_addr_sk,
             : ss_store_sk,
             : ss_promo_sk,
             : ss_ticket_number,
             : ss_quantity,
             : ss_wholesale_cost,
             : ss_list_price,
             : ss_sales_price,
             : ss_ext_discount_amt,
             : ss_ext_sales_price,
             : ss_ext_wholesale_cost,
             : ss_ext_list_price,
             : ss_ext_tax,
             : ss_coupon_amt,
             : ss_net_paid,
             : ss_net_paid_inc_tax,
             : ss_net_profit,
             : ss_sold_date_sk
nit: could be *


http://gerrit.cloudera.org:8080/#/c/21927/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test@311
PS4, Line 311: # Partition number is 1.
             : # There should be no shuffling.
You could also have a test where many more records are written to a single 
partition, e.g.:

 create table store_sales_1_huge_part
 partitioned by (part_col)
 stored as iceberg as
   select a.*, 100000 as part_col
   from store_sales a, store_sales b;

For me this also generates a very high number of instances for a plan fragment:

 F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=384

It's because in setupThresholdsForExecutorGroupSets() we use 
setNum_cores_per_executor(Integer.MAX_VALUE), so in 
analyzer.getMaxParallelismPerNode() we use max_fragment_instances_per_node=128 
as a limit.

Also there is a RANDOM shuffle before the writer even though the partition 
number is 1.



--
To view, visit http://gerrit.cloudera.org:8080/21927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51ab8fc35a5489351a88d372b28642b35449acfc
Gerrit-Change-Number: 21927
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 18 Oct 2024 12:54:28 +0000
Gerrit-HasComments: Yes

Reply via email to