Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 3:

(4 comments)

I'm making my way through this. First round of comments.

At a high level, I think another dimension of the design is how it interacts 
with future efforts to manage IO patterns (e.g. IMPALA-6673). I don't think 
that should hold up this review, and I don't see a problem with the SubRange 
abstraction.

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/file-reader.h
File be/src/runtime/io/file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/file-reader.h@49
PS3, Line 49: 'eosr' is set to true when end of file has reached,
            :   /// or the file reader has read all the bytes needed by 
'scan_range_'.
Depending on what we do with eosr, we may need to update this comment.


http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc@155
PS3, Line 155:         *eosr = true;
eosr is a bit weird now. It can detect end of file, but not end of scan range. 
I'll be taking another look at eosr and how it gets set, but we might end up 
renaming this variable to be eof.

I might be missing something, but the eosr logic seems like it could be 
implemented here. We have access to the scan_range_ and can increment 
bytes_read_. The ScanRange has bytes_to_read_ which is similar to len() today.


http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/request-ranges.h@416
PS3, Line 416:   /// They need to be ordered by offset and cannot overlap.
I think it would be useful to combine contiguous SubRanges into a single 
SubRange. (It doesn't necessarily need to happen in this code change.) Should 
we require that the SubRanges are non-contiguous (forcing the caller to merge 
them) or should we merge them ourselves?

(Is there a time when we wouldn't want to merge contiguous SubRanges?)


http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc@268
PS3, Line 268: "There should be no partial reads for sub-ranges."
> Yeah, I think it's not necessary to test eosr in the loop condition, but ot
About this DCHECK. We have a condition in ReadFromPos() that raises some 
questions for me:

https://github.com/apache/impala/blob/master/be/src/runtime/io/hdfs-file-reader.cc#L152

I'm reading into what would trigger that condition.

My current thought: If we issue an hdfsRead() call where the offset is near the 
end of the file and the length goes beyond the end of the file, does hdfsRead() 
produce an error or does it do a partial read and return less than the 
requested amount of data?



--
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 05 Oct 2018 00:28:14 +0000
Gerrit-HasComments: Yes

Reply via email to