Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8523 )
Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls ...................................................................... Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/8523/3/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/8523/3/be/src/scheduling/scheduler.cc@582 PS3, Line 582: ComputeScanRangeAssignment This function now does both assignment and generation of scan ranges. Also, it feels kind of weird to have the generation part in the scheduler. I think it may be better to separate these two concepts and move the scan range generation part out of here. http://gerrit.cloudera.org:8080/#/c/8523/3/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/8523/3/common/thrift/PlanNodes.thrift@195 PS3, Line 195: struct THdfsFileSplitSpec { : 1: required CatalogObjects.THdfsFileDesc file_desc : // -1 means not-splittable : 2: required i64 max_block_size : 3: required i64 partition_id : } THdfsFileDesc and THdfsFileSplit have quite a bit of intersection wrt the information they store. I am wondering if it would be simpler to modify THdfsFileSplit to handle both cases and avoid THdfsFileSplitSpec altogether. http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@131 PS3, Line 131: new int[] {} I believe you don't have to do that. You can probably pass null and then avoid L143 and L150 with couple of checks. http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@338 PS3, Line 338: its typo: it's http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@666 PS3, Line 666: private void generateScanRangeSpecs No intention to be picky but can you move this and tansformBlocksToScanRanges below function computeScanRangeLocations so that the code reads in a top-down fashion? thanks http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@694 PS3, Line 694: * TScanRangeLocationLists are added scanRanges_. If disk ids are expected in blocks to -- To view, visit http://gerrit.cloudera.org:8080/8523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 Gerrit-Change-Number: 8523 Gerrit-PatchSet: 3 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Wed, 03 Jan 2018 00:17:59 +0000 Gerrit-HasComments: Yes
