Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8814 )
Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time. ...................................................................... Patch Set 3: (14 comments) Thank you for working on this! I left some comments around clarifying things and on GetBytesInternal(). http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/hdfs-scanner.h@97 PS3, Line 97: /// This class also encapsulates row batch management. Subclasses should call CommitRows() nit: long line (and the new Gerrit UI breaks it now, making it look bad). http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@143 PS3, Line 143: /// (the scan range could be longer than the file). Can we extend this comment with what the implication are? Is the stream still valid afterwards? How is this different from scan_range_eosr_? http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@146 PS3, Line 146: /// If true, the stream has reached the end of the file. Can we extend this comment with what the implication are? Is the stream still valid afterwards? http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@161 PS3, Line 161: bool ReadBoolean(bool* boolean, Status* status); Should we add WARN_UNUSED_RESULT while you're here? http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@193 PS3, Line 193: /// Release resources from previous reads in this stream. If 'done' is true all This only releases buffers that are completely read, right? In that case, wouldn't adding back "completed" be more clear? http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@224 PS3, Line 224: initial "initial" implies that there are more, no? http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@254 PS3, Line 254: // We always want output_buffer_bytes_left_ to be non-NULL, so we can avoid a NULL check nit: long line http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@298 PS3, Line 298: 2 Huh? http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@320 PS3, Line 320: Always : /// returns the current I/O buffer to the I/O manager. This only seems true when the buffer has been read/skipped completely. http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc@101 PS3, Line 101: buffer_range->ReturnBuffer(move(io_buffer_)); Can we reset io_buffer_pos_ here, too? It looks to me below like we're using (io_buffer_bytes_left_ == 0, io_buffer_pos_ != nullptr) as an indicator that we're not at the end of the file. Would it make mistakes less likely if we added that state explicitly in some variable? http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc@182 PS3, Line 182: if (eosr()) return Status::OK(); Can you explain here (or in the header at either this function or scan_range_eosr_) why we don't use scan_range_eosr_ here? It's not clear to me. http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc@226 PS3, Line 226: Status ScannerContext::Stream::GetBytesInternal(int64_t requested_len, This method still looks very confusing to me. Can you think of ways to make the handling of the various cases more obvious? For example by handling case 1 (read from io_buffer) explicitly? Let's chat in person if you think it'll help. http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc@228 PS3, Line 228: DCHECK_GT(requested_len, boundary_buffer_bytes_left_); Can you also add a DCHECK to document and assert requested_len > io_buffer_bytes_left_? http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc@260 PS3, Line 260: num_bytes This is the number of bytes we still need to copy over from the io_buffer, right? Can you think of a better name? num_bytes_left_to_copy? -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Thu, 14 Dec 2017 03:05:48 +0000 Gerrit-HasComments: Yes