Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21927 )

Change subject: IMPALA-13445: Ignore num partition for unpartitioned writes
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21927/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/21927/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@446
PS4, Line 446: totalNumPartitions > 0
> This was intentional to avoid writing many files into single partition.
Static partition INSERTs are probably not uncommon, so I think it's worth 
special casing them, then we can deal with dynamic partitioning separately.

When only a single partition is targeted it might make sense to not shuffle at 
all, or randomly shuffle among a set of writers.

Partitioned INSERTs are the slowest when the data volume is large, but we only 
have a few target partitions. For such cases probably we should invent some 
kind of bin packing distribution (i.e. each partition have a set of associated 
writers), but this is out of context of this patch.


http://gerrit.cloudera.org:8080/#/c/21927/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test:

http://gerrit.cloudera.org:8080/#/c/21927/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test@311
PS4, Line 311: # Partition number is 1.
             : # There should be no shuffling.
> It should not go that high in this planner test because tpcdsParquetCpuCost
It went that high in my dev environment and I found it strange because I don't 
have that many cores.

Without cost-based planning we didn't shuffle at all, which means it can 
regress such INSERT performance if the data volume is large.



--
To view, visit http://gerrit.cloudera.org:8080/21927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51ab8fc35a5489351a88d372b28642b35449acfc
Gerrit-Change-Number: 21927
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 18 Oct 2024 15:12:14 +0000
Gerrit-HasComments: Yes

Reply via email to