Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/20700 )
Change subject: IMPALA-12299: Parallelize file listings of Iceberg tables on HDFS/Ozone ...................................................................... Patch Set 2: (5 comments) Left some minor comments, otherwise the change looks great! http://gerrit.cloudera.org:8080/#/c/20700/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20700/2//COMMIT_MSG@17 PS2, Line 17: 3-4x : speedup. Could you please elaborate on this a little more? I think it's worth to highlight that this was in your minicluster environment (if I assume correctly). What was the partition:file ration? Did you use multi-level partitioning? Also, what was the size of the thread pool? http://gerrit.cloudera.org:8080/#/c/20700/2/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/20700/2/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@206 PS2, Line 206: { nit: for single-line if statements we omit the braces. http://gerrit.cloudera.org:8080/#/c/20700/2/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@239 PS2, Line 239: partitionPaths = new HashSet<>(pool.submit(this ::icebergFilesByPartition).get()); Can we just invoke icebergFilesByPartition() in the current thread? Submitting it to the pool and getting it immediately doesn't seem to have any added value. http://gerrit.cloudera.org:8080/#/c/20700/2/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@247 PS2, Line 247: pool.shutdown(); nit: for readability, it worth to write it in its own line. http://gerrit.cloudera.org:8080/#/c/20700/2/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/20700/2/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@30 PS2, Line 30: import java.util.concurrent.ForkJoinPool; The new imports seem to be unused. -- To view, visit http://gerrit.cloudera.org:8080/20700 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5ca7e873f4ad0cc8dab6a77b62e05d965b4a76d Gerrit-Change-Number: 20700 Gerrit-PatchSet: 2 Gerrit-Owner: Peter Rozsa <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 21 Nov 2023 13:32:13 +0000 Gerrit-HasComments: Yes
