Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8707 )
Change subject: IMPALA-4835: Part 2: Allocate scan range buffers upfront ...................................................................... Patch Set 25: (35 comments) Thanks for the comments. Addressed most of your points but had a few minor follow-on questions. Still need to do a second pass over my changes for typos, but I thought it was worth pushing out now (it's also easier to see typos for me in gerrit :)) http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/exec/hdfs-parquet-scanner.cc@1445 PS25, Line 1445: ; > << "Already provided a buffer" was nice Done http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h File be/src/runtime/io/disk-io-mgr.h: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@71 PS25, Line 71: /// TODO: We should map readers to queries. A reader is the unit of scheduling and queries Moved this to live with the other TODOs at the bottom. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@84 PS25, Line 84: GetNextRange: returns to the caller the next scan range it should process. : /// This is based on disk load. This also begins reading the data in this scan : /// range. This is blocking. > that could probably use updating? or maybe rephrase to talk about states or Removed some of the detail here that fits better in the method comments. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@109 PS25, Line 109: Memory usage in the IoMgr comes from queued read buffers. If we queue the minimum : /// (i.e. 1), then the disks are idle while we are processing the buffer. If we don't : /// limit the queue, then it possible we end up queueing the entire data set (i.e. CPU : /// is slower than disks) and run out of memory. > this seems a bit outdated Removed this part, I think it was irrelevant in light of later sections. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@116 PS25, Line 116: In that case each query will need fewer queued buffers and : /// therefore have less memory usage. : // > and that Reworked this and the below paragraph. It's all a bit outdated. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@136 PS25, Line 136: a client buffer provided by the caller : /// when constructing the scan range. > in that case, is it required that the single buffer can fit the entire rang Clarified. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@147 PS25, Line 147: buffers are owned > how does a buffer become owned by the caller? I guess that means when it's I guess we don't actually provide a way for clients to determine how many buffers were allocated. It's easy to expose but we don't have a use case right now. I reworded this comment in terms of GetNext()/ReturnBuffer() to suggest the conservative behaviour for now. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@149 PS25, Line 149: it can allocate at least three : /// max-sized buffers per scan range > the API below (AllocateBuffersForRange()) doesn't seem to give the caller ( Reworded slightly - agree it implied that. The AllocateBuffersForRange() comment does mention 3 * max_buffer_size() so that makes it a little explicit at least. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@269 PS25, Line 269: , > "on return" Done http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@282 PS25, Line 282: range is : /// available > what does it mean for a range to be available (from an API perspective)? I think the bit about blocking was actually wrong (or at least extremely misleading) - it doesn't block indefinitely but rather gives up ASAP if there are no unstarted ranges. There's a .wait() call on a condition variable, but I think that only exists to deal with races with ranges being pulled of queues (I'm sure there's a potential simplification there). http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@288 PS25, Line 288: 'needs_buffers' is set to true > this returns true in '*needs_buffers' Done http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@300 PS25, Line 300: allocate > allocated Done http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@474 PS25, Line 474: /// Reads the specified scan range and calls HandleReadFinished() when done. > this should probably mention how it might not do anything if there's no buf Done http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.cc@375 PS25, Line 375: // Add the range to the queue of the disk the range is on > that doesn't seem to match the code block Removed. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.cc@387 PS25, Line 387: // Can't schedule until the caller allocates some buffers for the range. > that comment seems to talk about only one case handled here Done http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.cc@388 PS25, Line 388: *needs_buffers = range->external_buffer_tag_ == ScanRange::ExternalBufferTag::NO_BUFFER; I realised that this is potentially weird with zero-length scan ranges. I think zero-length ranges currently don't work, so I clarified this in the comments and added at atest. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h File be/src/runtime/io/request-context.h: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@41 PS25, Line 41: unstarted_ranges > not your change but unstarted_scan_ranges_ Done http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@48 PS25, Line 48: ScanRange's outgoing ready buffers is full. We can't read for this range : /// anymore. We need the caller to pull a buffer off which will put this in : /// the in_flight_ranges queue. These ranges are in the RequestContext's : /// blocked_ranges_ queue. > this looks outdated Updated to reflect the new blocking implementation. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@53 PS25, Line 53: // > now does the new active_ranges_ fit in to the above, if at all? Extended the comment to better explain. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@54 PS25, Line 54: outgoing queue > update Done http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@202 PS25, Line 202: added to the disk : /// queue > what do we mean by "disk queue"? I guess the inflight queue (as opposed to Clarified. It's the 'in_flight_ranges_' queue, but ranges there move to 'inflight_ranges' automatically. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@209 PS25, Line 209: ScheduleContext() later to schedule the range > is that the same as the "range is scheduled later " case above? Or is does Yeah I think this part of the comment was vague at best and maybe just straight incorrect. I reworded to more concretely describe what it does in terms of queues and provided an example http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@303 PS25, Line 303: may have not yet : /// hit eos or been cancelled > I don't follow that part. Is that saying they might or might not have hit Agree it's redundant with the last paragraph, removed it here. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@457 PS25, Line 457: A ScanRange is added to this queue when it is returned in : /// GetNextRange(), or when it is added with schedule_immediately = true. > update Done http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.cc File be/src/runtime/io/request-context.cc: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.cc@154 PS25, Line 154: schedule_mode > maybe updating the " A scan range for the reader is on one of five states:" Tried to improve the header comments. Agree that it's confusing, particularly the BY_CALLER. BY_CALLER seems like it could be avoided if the accounting was done differently. I also realised that we don't have full test coverage for the case where AllocateBuffersForRange() is not called after needs_buffers=true is returned. I added a unit test. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.cc@160 PS25, Line 160: disk_state->set_done(false); > this is surprising given the comment for done_ in the header Do you mean because it sort-of implies that it's a terminal state? We could maybe rename it to be clearer. E.g. 'inactive_'. Or invert it and make it 'active_' http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@308 PS25, Line 308: /// Add buffers for the range to read data into. > this will also schedule the range if needed to unblock GetNext, right? Done http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@311 PS25, Line 311: AddBuffers > how about AddUnusedBuffers() to help distinguish from EnqueueBuffer()? May Done http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@323 PS25, Line 323: GetNextBufferForRange > Would be nice to have "Unused" in the name here too. Done http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@345 PS25, Line 345: This is invoked by RequestContext::Cancel(), which does the removal itself. > why is that necessary? RequestContext calls this while iterating over active_ranges_. If we used CancelFromReader(), it would call back into RequestContext and remove the entry from 'active_ranges_', which I think believe invalidates the iterator. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@437 PS25, Line 437: iomgr > what's the significance of the iomgr in the name? Oh, I guess because this Yeah, exactly. I added this to try to disambiguate. LMK if you think it's confusing, I'm not in love with the naming here. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/scan-range.cc File be/src/runtime/io/scan-range.cc: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/scan-range.cc@95 PS25, Line 95: reader_->num_used_buffers_.Add(-1); > do we actually use these counters? seems like we do an awful lot of bookke It looks like it's only used for debugging (validation + debug strings). I think the DCHECKS in the ScanRange and BufferDescriptor destructors are sufficient to detect memory leaks, so I don't see a real benefit. Would be easy to rip out. I can go ahead and do that if you don't mind the churn on this patch or otherwise defer the work. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/scan-range.cc@116 PS25, Line 116: bool schedule_range = !returned; > rather than having 'returned', could we just start the scan range with bloc Done http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/util/bit-util.h File be/src/util/bit-util.h: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/util/bit-util.h@102 PS25, Line 102: RoundDownToPowerOfTwo > hmm it's pretty confusing that RoundDownToPowerOfTwo and RoundDownToPowerOf Yeah I think Round{Up,Down}ToPowerOfTwoFactor would be a more appropriate name for the other set of functions. I used that name when adding the equivalent to the frontend. Should I file a JIRA to do the cleanup? Didn't want to add more code churn to this patch. http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/util/runtime-profile-counters.h@65 PS25, Line 65: } while (false); > hmm it's really unfortunate that we have these cases. I assume this is beca The API allows clients to pass in their own counters, so I think some clients don't pass in anything. Moved to disk-io-mgr-internal.h. I also skipped implementing the #if ENABLE_COUNTERS thing over there since afaik it's totally bitrotted anyway. -- To view, visit http://gerrit.cloudera.org:8080/8707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f Gerrit-Change-Number: 8707 Gerrit-PatchSet: 25 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Sat, 17 Feb 2018 00:19:55 +0000 Gerrit-HasComments: Yes
