Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20523
Change subject: IMPALA-12477: Make Iceberg planFiles() use multiple threads ...................................................................... IMPALA-12477: Make Iceberg planFiles() use multiple threads Impala is not using Iceberg’s planFiles() API in a performant way: https://github.com/apache/impala/blob/2d3289027c2ffdd245d13b60e6fa3f9b3e7bf833/fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java#L46 Instead of a for-loop we should use a forEach() like Hive does: https://github.com/apache/hive/blob/071b721d8d73cc4d5d2d9469d7953bdc75ff615f/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java#L222 The forEach() spreads the work across multiple threads. This will not only improve table loading times, but also improves queries that use planFiles(), e.g. queries that push down predicates to Iceberg and time-travel queries. Change-Id: I00db941dd5ac9917cd91d990fccf37e5bcfddbfc --- M fe/src/main/java/org/apache/impala/catalog/iceberg/GroupedContentFiles.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java 2 files changed, 44 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/20523/1 -- To view, visit http://gerrit.cloudera.org:8080/20523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I00db941dd5ac9917cd91d990fccf37e5bcfddbfc Gerrit-Change-Number: 20523 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
