Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/11520 )
Change subject: IMPALA-7543: Enhance scan ranges to support sub-ranges ...................................................................... Patch Set 3: (4 comments) I'm making my way through this. First round of comments. At a high level, I think another dimension of the design is how it interacts with future efforts to manage IO patterns (e.g. IMPALA-6673). I don't think that should hold up this review, and I don't see a problem with the SubRange abstraction. http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/file-reader.h File be/src/runtime/io/file-reader.h: http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/file-reader.h@49 PS3, Line 49: 'eosr' is set to true when end of file has reached, : /// or the file reader has read all the bytes needed by 'scan_range_'. Depending on what we do with eosr, we may need to update this comment. http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc File be/src/runtime/io/hdfs-file-reader.cc: http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/hdfs-file-reader.cc@155 PS3, Line 155: *eosr = true; eosr is a bit weird now. It can detect end of file, but not end of scan range. I'll be taking another look at eosr and how it gets set, but we might end up renaming this variable to be eof. I might be missing something, but the eosr logic seems like it could be implemented here. We have access to the scan_range_ and can increment bytes_read_. The ScanRange has bytes_to_read_ which is similar to len() today. http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/request-ranges.h@416 PS3, Line 416: /// They need to be ordered by offset and cannot overlap. I think it would be useful to combine contiguous SubRanges into a single SubRange. (It doesn't necessarily need to happen in this code change.) Should we require that the SubRanges are non-contiguous (forcing the caller to merge them) or should we merge them ourselves? (Is there a time when we wouldn't want to merge contiguous SubRanges?) http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc File be/src/runtime/io/scan-range.cc: http://gerrit.cloudera.org:8080/#/c/11520/3/be/src/runtime/io/scan-range.cc@268 PS3, Line 268: "There should be no partial reads for sub-ranges." > Yeah, I think it's not necessary to test eosr in the loop condition, but ot About this DCHECK. We have a condition in ReadFromPos() that raises some questions for me: https://github.com/apache/impala/blob/master/be/src/runtime/io/hdfs-file-reader.cc#L152 I'm reading into what would trigger that condition. My current thought: If we issue an hdfsRead() call where the offset is near the end of the file and the length goes beyond the end of the file, does hdfsRead() produce an error or does it do a partial read and return less than the requested amount of data? -- To view, visit http://gerrit.cloudera.org:8080/11520 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea26ba386713990f7671aab5a372cf449b8d51e4 Gerrit-Change-Number: 11520 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 05 Oct 2018 00:28:14 +0000 Gerrit-HasComments: Yes
