Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/20434 )
Change subject: IMPALA-12408: Optimize HdfsScanNode.computeScanRangeLocations() ...................................................................... Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/20434/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/20434/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@333 PS5, Line 333: Map<Long, List<FileDescriptor>> sampledFiles_ = null; > Please document what is the Long key in this map represent. Looks like it i Added comment. http://gerrit.cloudera.org:8080/#/c/20434/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1150 PS5, Line 1150: for (FeFsPartition partition: partitions_) { > General question: is it worth or even possible to parallelize this loop? Ma My guess is no, after the patch this function doesn't seem to dominate planning time, at least in the queries I tested with. It is possible that with huge number of partitions / files it would be still slow, but there can be other bottlenecks too. It would be interesting collect profiles from a test run like tpcds, maybe there are other low hanging fruits for speedup in the planner. http://gerrit.cloudera.org:8080/#/c/20434/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1153 PS5, Line 1153: String partitionLocation = partition.getLocation(); : Path partitionPath = new Path(partitionLocation); > Ok, so consistent hashCode from Java's String.hashCode() turns out to be im Done -- To view, visit http://gerrit.cloudera.org:8080/20434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf3e9c169d65c15df6a6762cc68fbb477fe64a7c Gerrit-Change-Number: 20434 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 30 Aug 2023 18:31:44 +0000 Gerrit-HasComments: Yes
