Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21927 )
Change subject: IMPALA-13445: Ignore num partition for unpartitioned writes ...................................................................... Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/21927/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: http://gerrit.cloudera.org:8080/#/c/21927/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@446 PS4, Line 446: totalNumPartitions > 0 > This was intentional to avoid writing many files into single partition. Static partition INSERTs are probably not uncommon, so I think it's worth special casing them, then we can deal with dynamic partitioning separately. When only a single partition is targeted it might make sense to not shuffle at all, or randomly shuffle among a set of writers. Partitioned INSERTs are the slowest when the data volume is large, but we only have a few target partitions. For such cases probably we should invent some kind of bin packing distribution (i.e. each partition have a set of associated writers), but this is out of context of this patch. http://gerrit.cloudera.org:8080/#/c/21927/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test: http://gerrit.cloudera.org:8080/#/c/21927/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test@311 PS4, Line 311: # Partition number is 1. : # There should be no shuffling. > It should not go that high in this planner test because tpcdsParquetCpuCost It went that high in my dev environment and I found it strange because I don't have that many cores. Without cost-based planning we didn't shuffle at all, which means it can regress such INSERT performance if the data volume is large. -- To view, visit http://gerrit.cloudera.org:8080/21927 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I51ab8fc35a5489351a88d372b28642b35449acfc Gerrit-Change-Number: 21927 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 18 Oct 2024 15:12:14 +0000 Gerrit-HasComments: Yes
