Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )
Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges ...................................................................... Patch Set 4: (4 comments) Thanks for the comments. Yeah, looking at IMPALA-6673 I don't think there should be a conflict with these changes. http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/file-reader.h File be/src/runtime/io/file-reader.h: http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/file-reader.h@49 PS3, Line 49: 'eof' is set to true when end of file has reached. : virtual Status ReadFromPos(int64_t file_offset, uint8_t* buffer, > Depending on what we do with eosr, we may need to update this comment. Updated comment, and renamed eosr to eof. http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc File be/src/runtime/io/hdfs-file-reader.cc: http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc@155 PS3, Line 155: *eof = true; > eosr is a bit weird now. It can detect end of file, but not end of scan ran I renamed eosr to eof. Currently file readers doesn't see how many bytes have been read (copied) from cache when the file is available in the HDFS cache. This is the main reason why this logic is implemented there. Putting that logic into the file readers doesn't seem wrong to me, we would need to change some parts of ScanRange, but nothing dramatic. What do you think? http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/request-ranges.h@416 PS3, Line 416: /// Validates the sub-ranges. All sub-range must be insid > I think it would be useful to combine contiguous SubRanges into a single Su Yeah, I was thinking about it after uploading the last PS. Now I've added MergeSubRanges() which I call in AddSubRanges(). http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc File be/src/runtime/io/scan-range.cc: http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc@268 PS3, Line 268: "There should be no partial reads for sub-ranges." > About this DCHECK. We have a condition in ReadFromPos() that raises some qu My idea is to only issue such read operations (that go beyond the end of the file) when there are no sub-ranges. We should only issue sub-ranges when we exactly know what do we want to read, e.g. Parquet pages in a column chunk. Btw, hdfsRead() does a partial read in this scenario, but we only allow that for ScanRanges that don't have sub-ranges. -- To view, visit http://gerrit.cloudera.org:8080/11520 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4 Gerrit-Change-Number: 11520 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 05 Oct 2018 15:14:59 +0000 Gerrit-HasComments: Yes
