Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8523 )
Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls ...................................................................... Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/8523/3/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/8523/3/be/src/scheduling/scheduler.cc@582 PS3, Line 582: ComputeScanRangeAssignment > This function now does both assignment and generation of scan ranges. Also, I moved it to QueryScheduler... that seems to be the place where data about scheduling is stored (including the request with scan ranges from the FE). Now, the idea is to first generate these ranges (done in client request), then schedule. If there is a better place to store such per request info, let me know and I can easily move it there. The only thing the scheduler knows about all of this right now is to stitch together computed and given ranges-- that's a bit gorpy (L258) and I plan to fix it if this flow/organization looks ok. http://gerrit.cloudera.org:8080/#/c/8523/3/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/8523/3/common/thrift/PlanNodes.thrift@195 PS3, Line 195: struct THdfsFileSplitSpec { : 1: required CatalogObjects.THdfsFileDesc file_desc : // -1 means not-splittable : 2: required i64 max_block_size : 3: required i64 partition_id : } > THdfsFileDesc and THdfsFileSplit have quite a bit of intersection wrt the i I did this for two reasons: 1) I expect this Spec to be the only thing that comes out of the FE for hdfs files (unexpanded) and THdfsFileSplit will only be present on the BE (expanded). Currently, I'm doing this just for s3/adls/local, but I'm thinking to apply it to all (not this change) hdfs files. 2) I preferred to use the type-checker to tell me which representation I have (expanded vs. unexpanded). Otherwise, I'd have to peer into THdfsFileSplit to determine this state, which seems more error-prone. http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@131 PS3, Line 131: new int[] {} > I believe you don't have to do that. You can probably pass null and then av Done http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@338 PS3, Line 338: its > typo: it's Done http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@666 PS3, Line 666: private void generateScanRangeSpecs > No intention to be picky but can you move this and tansformBlocksToScanRang Done http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@694 PS3, Line 694: * TScanRangeLocationLists are added scanRanges_. If disk ids are expected in blocks > to Done -- To view, visit http://gerrit.cloudera.org:8080/8523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 Gerrit-Change-Number: 8523 Gerrit-PatchSet: 3 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Fri, 12 Jan 2018 01:19:09 +0000 Gerrit-HasComments: Yes
