Vuk Ercegovac has posted comments on this change. ( )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls

Patch Set 9:


main changes: (1) minimized changes to the main scheduler method with an 
iterator, (2) added backend tests
Commit Message:
PS9, Line 30: Testing:
> Do we have a test that covers mixed queries? Have you considered adding a t
thx for the suggestions. added tests to there are end-to-end 
tests for testing multiple file systems in the same table/query (see 
File be/src/scheduling/scheduler.h:
PS9, Line 450: instances
> nit: instance
File be/src/scheduling/
PS9, Line 257:       int num_ranges = 0;
> Have you considered generating the scan ranges here and passing them into C
I tried something like this, but it created two pases so I backed it out. After 
discussing, so that we avoid additional complexity to the main 
ComputeRangeAssignment method, I wrapped the original scan ranges in an 
iterator that generates new scan ranges or returns the original ones.
PS9, Line 504: // expanded_locations. 'spec' is assumed to *not* have block 
location information.
> missing rename?
PS9, Line 504: 'spec' is assumed to *not* have block location information.
> It doesn't have a member to do so. Is this a stale comment?
removed. this refers to longer range plan where hdfs files (with block info) 
will also be handled. the todo in the thrift spec covers this.
PS9, Line 505: HDFS
> This comment seems a bit misleading, since S3 etc are not HDFS filesystems.
PS9, Line 516:   long scan_range_offset = 0;
> Any reason to not use int64_t here?
PS9, Line 518:   long scan_range_length = std::min(spec.max_block_size, 
> I'm not sure it can happen, but if spec.max_block_size == 0, the loop below
good point. added a check for this in FE as well as here.
PS9, Line 582: Defer
> "Defer" read to me like if something is being done immediately. Can you cla
PS9, Line 603:     *num_ranges += 1;
> If you move this count to L676 right after the call to RecordScanRangeAssig
got rid of this part of the api. the number of scan ranges is now accessed via 
the iterator.
PS9, Line 681:     remote_scan_range_locations.push_back(&location);
> nit: use vector<>::insert() instead of a loop here.
PS9, Line 686:     // Allow exec_at_coord to take precedence over selecting a 
remote host.
> This should only happen for generated scan ranges. Can you add a DCHECK to 
File common/thrift/PlanNodes.thrift:
PS9, Line 204:   // -1 means not-splittable
> nit: add newlines above comments.
PS9, Line 205:   2: required i64 max_block_size
> Could this ever be infinity? I feel it might be clearer to have a boolean f

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 9
Gerrit-Owner: Vuk Ercegovac <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Reviewer: Vuk Ercegovac <>
Gerrit-Comment-Date: Mon, 12 Feb 2018 08:40:29 +0000
Gerrit-HasComments: Yes

Reply via email to