Hello Andrew Sherman, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/20523
to look at the new patch set (#4).
Change subject: IMPALA-12477: Make Iceberg planFiles() use multiple threads
......................................................................
IMPALA-12477: Make Iceberg planFiles() use multiple threads
Impala is not using Iceberg’s planFiles() API in a performant way:
https://github.com/apache/impala/blob/2d3289027c2ffdd245d13b60e6fa3f9b3e7bf833/fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java#L46
Instead of a for-loop we should use a forEach() like Hive does:
https://github.com/apache/hive/blob/071b721d8d73cc4d5d2d9469d7953bdc75ff615f/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java#L222
The forEach() spreads the work across multiple threads. This will not
only improve table loading times, but also improves queries that use
planFiles(), e.g. queries that push down predicates to Iceberg and
time-travel queries.
Change-Id: I00db941dd5ac9917cd91d990fccf37e5bcfddbfc
---
M fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
2 files changed, 44 insertions(+), 29 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/20523/4
--
To view, visit http://gerrit.cloudera.org:8080/20523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I00db941dd5ac9917cd91d990fccf37e5bcfddbfc
Gerrit-Change-Number: 20523
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Andrew Sherman <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>