Hello Riza Suminto, Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20973

to look at the new patch set (#3).

Change subject: IMPALA-12765: Balance consecutive partitions better for Iceberg 
tables
......................................................................

IMPALA-12765: Balance consecutive partitions better for Iceberg tables

During remote read scheduling Impala does the following:

Non-Iceberg tables
 * The scheduler processes the scan ranges in partition key order
 * The scheduler selects N executors as candidates
 * The scheduler chooses the executor from the candidates based on
   minimum number of assigned bytes
 * So consecutive partitions are more likely to be assigned to
   different executors

Iceberg tables
 * The scheduler processes the scan ranges in random order
 * The scheduler selects N executors as candidates
 * The scheduler chooses the executor from the candidates based on
   minimum number of assigned bytes
 * So consecutive partitions (by partition key order) are assigned
   randomly, i.e. there's a higher chance of clustering

With this patch, IcebergScanNode orders its file descriptors based on
their paths, so we will have a more balanced scheduling for consecutive
partitions. It is especially important for queries that prune partitions
via runtime filters (e.g. due to a JOIN), because it doesn't matter that
we schedule the scan ranges evenly, the scan ranges that survive the
runtime filters can still be clustered on certain executors.

E.g. TPC-DS Q22 has the following JOIN and WHERE predicates:

 inv_date_sk=d_date_sk and
 d_month_seq between 1199 and 1199 + 11

The Inventory table is partitioned by column inv_date_sk, and we filter
the rows in the joined table by 'd_month_seq between 1199 and
1199 + 11'. This means the we will only need a range of partitions from
the Inventory table, but that range will only be revealed during
runtime. Scheduling neighbouring partitions to different executors means
that the surviving partitions are spread across executors more evenly.

Testing:
 * e2e test

Change-Id: I60773965ecbb4d8e659db158f1f0ac76086d5578
---
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M tests/query_test/test_iceberg.py
2 files changed, 62 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/20973/3
--
To view, visit http://gerrit.cloudera.org:8080/20973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60773965ecbb4d8e659db158f1f0ac76086d5578
Gerrit-Change-Number: 20973
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>

Reply via email to