Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21927 )

Change subject: IMPALA-13445: Ignore num partition for unpartitioned writes
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21927/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/21927/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@446
PS4, Line 446: totalNumPartitions > 0
> Static partition INSERTs are probably not uncommon, so I think it's worth s
Done. Tested with planner test creating table customer_address_1_huge_part.


http://gerrit.cloudera.org:8080/#/c/21927/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@450
PS4, Line 450: Math.
> nit: this cast is unnecessary
Done


http://gerrit.cloudera.org:8080/#/c/21927/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test:

http://gerrit.cloudera.org:8080/#/c/21927/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test@122
PS4, Line 122: ---- PARALLELPLANS
             : Max Per-Host Resource Reservation: Memory=168.00MB Threads=24
             : Per-Host Resource Estimates: Memory=78.24GB
             : F01:PLAN FRAGMENT 
[HASH(tpcds_partitioned_parquet_snap.store_sales.ss_sold_date_sk)] hosts=10 
instances=120
             : |  Per-Instance Resources: mem-estimate=6.46GB 
mem-reservation=6.00MB thread-reservation=1
             : |  max-parallelism=120 segment-costs=[44372981821, 97047706322]
             : WRITE TO HDFS 
[tpcds_partitioned_parquet_snap.store_sales_duplicate, OVERWRITE=false, 
PARTITION-KEYS=(ss_sold_date_sk)]
             : |  output exprs: ss_sold_time_sk, ss_item_sk, ss_customer_sk, 
ss_cdemo_sk, ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk, 
ss_ticket_number, ss_quantity, ss_wholesale_cost, ss_list_price, 
ss_sales_price, ss_ext_discount_amt, ss_ext_sales_price, ss_ext_wholesale_cost, 
ss_ext_list_price, ss_ext_tax, ss_coupon_amt, ss_net_paid, ss_net_paid_inc_tax, 
ss_net_profit, ss_sold_date_sk
             : |  mem-estimate=100.00KB mem-reservation=0B thread-reservation=0 
cost=97047706322
             : |
             : 02:SORT
             : |  order by: ss_sold_date_sk ASC NULLS LAST
             : |  mem-estimate=6.44GB mem-reservation=6.00MB 
spill-buffer=2.00MB thread-reservation=0
             : |  tuple-ids=2 row-size=96B cardinality=8.64G cost=39597689640
             : |  in pipelines: 02(GETNEXT), 00(OPEN)
             : |
             : 01:EXCHANGE 
[HASH(tpcds_partitioned_parquet_snap.store_sales.ss_sold_date_sk)]
             : |  mem-estimate=21.72MB mem-reservation=0B thread-reservation=0
             : |  tuple-ids=0 row-size=96B cardinality=8.64G cost=4775292181
             : |  in pipelines: 00(GETNEXT)
             : |
             : F00:PLAN FRAGMENT [RANDOM] hosts=10 instances=120
             : Per-Instance Re
> I think Planner insist that I list column names when I declare "partitioned
Done


http://gerrit.cloudera.org:8080/#/c/21927/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test@311
PS4, Line 311: |  |  tuple-ids=1 row-size=0B cardinality=15.00M cost=19934990
             : |  |  in pipelines: 01(GETNEXT)
> It went that high in my dev environment and I found it strange because I do
max_fragment_instances_per_node option should be the safeguard here.

  public int getMaxParallelismPerNode() {
    if (getQueryOptions().isCompute_processing_cost()) {
      return Math.max(getMinParallelismPerNode(),
          Math.min(getQueryOptions().getMax_fragment_instances_per_node(),
              getAvailableCoresPerNode()));
    } else if (getQueryOptions().getMt_dop() > 0) {
      return getQueryOptions().getMt_dop();
    } else {
      return 1;
    }
  }

Downstream, we fix it as default query option. I can look into bounding this 
further with information from executor group set config, but that will involve 
wider changes that worth its own patch.



--
To view, visit http://gerrit.cloudera.org:8080/21927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51ab8fc35a5489351a88d372b28642b35449acfc
Gerrit-Change-Number: 21927
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 18 Oct 2024 17:14:47 +0000
Gerrit-HasComments: Yes

Reply via email to