Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8707 )
Change subject: IMPALA-4835: Part 2: Allocate scan range buffers upfront ...................................................................... Patch Set 25: (33 comments) http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/exec/hdfs-parquet-scanner.cc@1445 PS25, Line 1445: ; << "Already provided a buffer" was nice http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h File be/src/runtime/io/disk-io-mgr.h: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@84 PS25, Line 84: GetNextRange: returns to the caller the next scan range it should process. : /// This is based on disk load. This also begins reading the data in this scan : /// range. This is blocking. that could probably use updating? or maybe rephrase to talk about states or something rather than API calls. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@109 PS25, Line 109: Memory usage in the IoMgr comes from queued read buffers. If we queue the minimum : /// (i.e. 1), then the disks are idle while we are processing the buffer. If we don't : /// limit the queue, then it possible we end up queueing the entire data set (i.e. CPU : /// is slower than disks) and run out of memory. this seems a bit outdated http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@116 PS25, Line 116: In that case each query will need fewer queued buffers and : /// therefore have less memory usage. : // and that http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@136 PS25, Line 136: a client buffer provided by the caller : /// when constructing the scan range. in that case, is it required that the single buffer can fit the entire range? http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@147 PS25, Line 147: buffers are owned how does a buffer become owned by the caller? I guess that means when it's returned by GetNext() but not yet ReturnBuffer()? How does the caller know how many buffers there are in play in order to satisfy this requirement? or does it have to just conservatively assume that it should only "own" one buffer at a time? So it seems we effectively still have a client cooperation requirement in the new code (which I think is okay). It's just that instead of leading to a reservation violation, it would lead to a resource deadlock. I do agree the waiting on buffers rather than on queue space is an improvement though. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@149 PS25, Line 149: it can allocate at least three : /// max-sized buffers per scan range the API below (AllocateBuffersForRange()) doesn't seem to give the caller (explicit) control over how many buffers are allocated. I guess it's implicit because of the maximum IO buffer size? http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@269 PS25, Line 269: , "on return" or "If 'needs_buffers' is returned as true" to make it clearer this is an out-only param http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@282 PS25, Line 282: range is : /// available what does it mean for a range to be available (from an API perspective)? http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@288 PS25, Line 288: 'needs_buffers' is set to true this returns true in '*needs_buffers' http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@300 PS25, Line 300: allocate allocated http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@474 PS25, Line 474: /// Reads the specified scan range and calls HandleReadFinished() when done. this should probably mention how it might not do anything if there's no buffer available http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.cc@375 PS25, Line 375: // Add the range to the queue of the disk the range is on that doesn't seem to match the code block http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.cc@387 PS25, Line 387: // Can't schedule until the caller allocates some buffers for the range. that comment seems to talk about only one case handled here http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.cc@391 PS25, Line 391: *needs_buffers ? ScheduleMode::BY_CALLER why is this case different than when AddScanRanges() has NO_BUFFER (which does UPON_GETNEXT)? I guess in this case we don't want it to go through the next_scan_range_to_start() logic. But where does ScheduleRange() end up being called in this case? Oh, I guess via AddBuffers(), which is like UPON_GETNEXT. So, BY_CALLER name seems kind of misleading since UPON_GETNEXT also falls back to that case ultimately for scheduling the scan range. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h File be/src/runtime/io/request-context.h: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@41 PS25, Line 41: unstarted_ranges not your change but unstarted_scan_ranges_ http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@48 PS25, Line 48: ScanRange's outgoing ready buffers is full. We can't read for this range : /// anymore. We need the caller to pull a buffer off which will put this in : /// the in_flight_ranges queue. These ranges are in the RequestContext's : /// blocked_ranges_ queue. this looks outdated http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@53 PS25, Line 53: // now does the new active_ranges_ fit in to the above, if at all? http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@54 PS25, Line 54: outgoing queue update http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@202 PS25, Line 202: added to the disk : /// queue what do we mean by "disk queue"? I guess the inflight queue (as opposed to the unstarted queue? http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@209 PS25, Line 209: ScheduleContext() later to schedule the range is that the same as the "range is scheduled later " case above? Or is does "scheduled" mean both added to the queue and call ScheduleContext() whereas in this case it's added to the queue already and the caller only need to call ScheduleContext()? I guess it's hard to understand what exactly we mean by "schedule" - does it always mean both enqueue on the PerDiskState queue and schedule the context, or does it mean something different? http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@303 PS25, Line 303: may have not yet : /// hit eos or been cancelled I don't follow that part. Is that saying they might or might not have hit eos? or that they absolutely haven't hit eos if they are in this set? Okay, I guess the last paragraph talks about this, but this still statement seems confusing. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@457 PS25, Line 457: A ScanRange is added to this queue when it is returned in : /// GetNextRange(), or when it is added with schedule_immediately = true. update http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.cc File be/src/runtime/io/request-context.cc: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.cc@154 PS25, Line 154: schedule_mode having these three cases is a bit confusing, note to self to think about that some more. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.cc@160 PS25, Line 160: disk_state->set_done(false); this is surprising given the comment for done_ in the header http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@308 PS25, Line 308: /// Add buffers for the range to read data into. this will also schedule the range if needed to unblock GetNext, right? http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@311 PS25, Line 311: AddBuffers how about AddUnusedBuffers() to help distinguish from EnqueueBuffer()? Maybe even rename EnqueueBuffer() to EnqueueReadyBuffer() or AddReadyBuffer() to further explain. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@323 PS25, Line 323: GetNextBufferForRange Would be nice to have "Unused" in the name here too. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@345 PS25, Line 345: This is invoked by RequestContext::Cancel(), which does the removal itself. why is that necessary? http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@437 PS25, Line 437: iomgr what's the significance of the iomgr in the name? Oh, I guess because this is only used for "IO-mgr allocated" buffers (implied by NO_BUFFER)? http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/scan-range.cc File be/src/runtime/io/scan-range.cc: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/scan-range.cc@95 PS25, Line 95: reader_->num_used_buffers_.Add(-1); do we actually use these counters? seems like we do an awful lot of bookkeeping. anyway, cleaning that up is out of scope. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/util/bit-util.h File be/src/util/bit-util.h: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/util/bit-util.h@102 PS25, Line 102: RoundDownToPowerOfTwo hmm it's pretty confusing that RoundDownToPowerOfTwo and RoundDownToPowerOf2 are different things (though admittedly the existing names seem wrong). http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/util/runtime-profile-counters.h@65 PS25, Line 65: } while (false); hmm it's really unfortunate that we have these cases. I assume this is because of tests? How about defining these local to disk-io-mgr.cc as to not encourage this? (and a todo that say we should just make these things are always not null). -- To view, visit http://gerrit.cloudera.org:8080/8707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f Gerrit-Change-Number: 8707 Gerrit-PatchSet: 25 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 15 Feb 2018 06:31:55 +0000 Gerrit-HasComments: Yes
