Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8523/3/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/8523/3/be/src/scheduling/scheduler.cc@582
PS3, Line 582: ComputeScanRangeAssignment
> This function now does both assignment and generation of scan ranges. Also,
I moved it to QueryScheduler... that seems to be the place where data about 
scheduling is stored (including the request with scan ranges from the FE). Now, 
the idea is to first generate these ranges (done in client request), then 
schedule. If there is a better place to store such per request info, let me 
know and I can easily move it there.
The only thing the scheduler knows about all of this right now is to stitch 
together computed and given ranges-- that's a bit gorpy (L258) and I plan to 
fix it if this flow/organization looks ok.


http://gerrit.cloudera.org:8080/#/c/8523/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/8523/3/common/thrift/PlanNodes.thrift@195
PS3, Line 195: struct THdfsFileSplitSpec {
             :   1: required CatalogObjects.THdfsFileDesc file_desc
             :   // -1 means not-splittable
             :   2: required i64 max_block_size
             :   3: required i64 partition_id
             : }
> THdfsFileDesc and THdfsFileSplit have quite a bit of intersection wrt the i
I did this for two reasons:
1) I expect this Spec to be the only thing that comes out of the FE for hdfs 
files (unexpanded) and THdfsFileSplit will only be present on the BE 
(expanded). Currently, I'm doing this just for s3/adls/local, but I'm thinking 
to apply it to all (not this change) hdfs files.
2) I preferred to use the type-checker to tell me which representation I have 
(expanded vs. unexpanded). Otherwise, I'd have to peer into THdfsFileSplit to 
determine this state, which seems more error-prone.


http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@131
PS3, Line 131: new int[] {}
> I believe you don't have to do that. You can probably pass null and then av
Done


http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@338
PS3, Line 338: its
> typo: it's
Done


http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@666
PS3, Line 666: private void generateScanRangeSpecs
> No intention to be picky but can you move this and tansformBlocksToScanRang
Done


http://gerrit.cloudera.org:8080/#/c/8523/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@694
PS3, Line 694:    * TScanRangeLocationLists are added scanRanges_. If disk ids 
are expected in blocks
> to
Done



--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 3
Gerrit-Owner: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Fri, 12 Jan 2018 01:19:09 +0000
Gerrit-HasComments: Yes

Reply via email to