Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11520 )

Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges
......................................................................


Patch Set 4:

(4 comments)

Thanks for the comments.
Yeah, looking at IMPALA-6673 I don't think there should be a conflict with 
these changes.

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/file-reader.h
File be/src/runtime/io/file-reader.h:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/file-reader.h@49
PS3, Line 49: 'eof' is set to true when end of file has reached.
            :   virtual Status ReadFromPos(int64_t file_offset, uint8_t* buffer,
> Depending on what we do with eosr, we may need to update this comment.
Updated comment, and renamed eosr to eof.


http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc@155
PS3, Line 155:         *eof = true;
> eosr is a bit weird now. It can detect end of file, but not end of scan ran
I renamed eosr to eof.
Currently file readers doesn't see how many bytes have been read (copied) from 
cache when the file is available in the HDFS cache. This is the main reason why 
this logic is implemented there.

Putting that logic into the file readers doesn't seem wrong to me, we would 
need to change some parts of ScanRange, but nothing dramatic. What do you think?


http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/request-ranges.h@416
PS3, Line 416:   /// Validates the sub-ranges. All sub-range must be insid
> I think it would be useful to combine contiguous SubRanges into a single Su
Yeah, I was thinking about it after uploading the last PS.
Now I've added MergeSubRanges() which I call in AddSubRanges().


http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc@268
PS3, Line 268: "There should be no partial reads for sub-ranges."
> About this DCHECK. We have a condition in ReadFromPos() that raises some qu
My idea is to only issue such read operations (that go beyond the end of the 
file) when there are no sub-ranges.

We should only issue sub-ranges when we exactly know what do we want to read, 
e.g. Parquet pages in a column chunk.

Btw, hdfsRead() does a partial read in this scenario, but we only allow that 
for ScanRanges that don't have sub-ranges.



--
To view, visit http://gerrit.cloudera.org:8080/11520
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4
Gerrit-Change-Number: 11520
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 05 Oct 2018 15:14:59 +0000
Gerrit-HasComments: Yes

Reply via email to