Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20434 )

Change subject: IMPALA-12408: Optimize HdfsScanNode.computeScanRangeLocations()
......................................................................


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20434/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20434/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@333
PS5, Line 333: Map<Long, List<FileDescriptor>> sampledFiles_ = null;
> Please document what is the Long key in this map represent. Looks like it i
Added comment.


http://gerrit.cloudera.org:8080/#/c/20434/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1150
PS5, Line 1150: for (FeFsPartition partition: partitions_) {
> General question: is it worth or even possible to parallelize this loop? Ma
My guess is no, after the patch this function doesn't seem to dominate planning 
time, at least in the queries I tested with. It is possible that with huge 
number of partitions / files it would be still slow, but there can be other 
bottlenecks too.

It would be interesting collect profiles from a test run like tpcds, maybe 
there are other low hanging fruits for speedup in the planner.


http://gerrit.cloudera.org:8080/#/c/20434/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1153
PS5, Line 1153: String partitionLocation = partition.getLocation();
              :       Path partitionPath = new Path(partitionLocation);
> Ok, so consistent hashCode from Java's String.hashCode() turns out to be im
Done



--
To view, visit http://gerrit.cloudera.org:8080/20434
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icf3e9c169d65c15df6a6762cc68fbb477fe64a7c
Gerrit-Change-Number: 20434
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 30 Aug 2023 18:31:44 +0000
Gerrit-HasComments: Yes

Reply via email to