Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8707 )

Change subject: IMPALA-4835: Part 2: Allocate scan range buffers upfront
......................................................................


Patch Set 25:

(35 comments)

Thanks for the comments. Addressed most of your points but had a few minor 
follow-on questions. Still need to do a second pass over my changes for typos, 
but I thought it was worth pushing out now (it's also easier to see typos for 
me in gerrit :))

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/exec/hdfs-parquet-scanner.cc@1445
PS25, Line 1445: ;
> << "Already provided a buffer" was nice
Done


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h
File be/src/runtime/io/disk-io-mgr.h:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@71
PS25, Line 71: /// TODO: We should map readers to queries. A reader is the unit 
of scheduling and queries
Moved this to live with the other TODOs at the bottom.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@84
PS25, Line 84: GetNextRange: returns to the caller the next scan range it 
should process.
             : ///     This is based on disk load. This also begins reading the 
data in this scan
             : ///     range. This is blocking.
> that could probably use updating? or maybe rephrase to talk about states or
Removed some of the detail here that fits better in the method comments.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@109
PS25, Line 109: Memory usage in the IoMgr comes from queued read buffers.  If 
we queue the minimum
              : /// (i.e. 1), then the disks are idle while we are processing 
the buffer. If we don't
              : /// limit the queue, then it possible we end up queueing the 
entire data set (i.e. CPU
              : /// is slower than disks) and run out of memory.
> this seems a bit outdated
Removed this part, I think it was irrelevant in light of later sections.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@116
PS25, Line 116: In that case each query will need fewer queued buffers and
              : /// therefore have less memory usage.
              : //
> and that
Reworked this and the below paragraph. It's all a bit outdated.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@136
PS25, Line 136: a client buffer provided by the caller
              : /// when constructing the scan range.
> in that case, is it required that the single buffer can fit the entire rang
Clarified.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@147
PS25, Line 147: buffers are owned
> how does a buffer become owned by the caller? I guess that means when it's
I guess we don't actually provide a way for clients to determine how many 
buffers were allocated. It's easy to expose but we don't have a use case right 
now. I reworded this comment in terms of GetNext()/ReturnBuffer() to suggest 
the conservative behaviour for now.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@149
PS25, Line 149: it can allocate at least three
              : /// max-sized buffers per scan range
> the API below (AllocateBuffersForRange()) doesn't seem to give the caller (
Reworded slightly - agree it implied that. The AllocateBuffersForRange() 
comment does mention 3 * max_buffer_size() so that makes it a little explicit 
at least.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@269
PS25, Line 269: ,
> "on return"
Done


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@282
PS25, Line 282: range is
              :   /// available
> what does it mean for a range to be available (from an API perspective)?
I think the bit about blocking was actually wrong (or at least extremely 
misleading) - it doesn't block indefinitely but rather gives up ASAP if there 
are no unstarted ranges.

There's a .wait() call on a condition variable, but I think that only exists to 
deal with races with ranges being pulled of queues (I'm sure there's a 
potential simplification there).


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@288
PS25, Line 288: 'needs_buffers' is set to true
> this returns true in '*needs_buffers'
Done


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@300
PS25, Line 300: allocate
> allocated
Done


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@474
PS25, Line 474:   /// Reads the specified scan range and calls 
HandleReadFinished() when done.
> this should probably mention how it might not do anything if there's no buf
Done


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.cc@375
PS25, Line 375:   // Add the range to the queue of the disk the range is on
> that doesn't seem to match the code block
Removed.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.cc@387
PS25, Line 387:   // Can't schedule until the caller allocates some buffers for 
the range.
> that comment seems to talk about only one case handled here
Done


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.cc@388
PS25, Line 388:   *needs_buffers = range->external_buffer_tag_ == 
ScanRange::ExternalBufferTag::NO_BUFFER;
I realised that this is potentially weird with zero-length scan ranges. I think 
zero-length ranges currently don't work, so I clarified this in the comments 
and added at atest.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h
File be/src/runtime/io/request-context.h:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@41
PS25, Line 41: unstarted_ranges
> not your change but unstarted_scan_ranges_
Done


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@48
PS25, Line 48: ScanRange's outgoing ready buffers is full. We can't read for 
this range
             : ///    anymore. We need the caller to pull a buffer off which 
will put this in
             : ///    the in_flight_ranges queue. These ranges are in the 
RequestContext's
             : ///    blocked_ranges_ queue.
> this looks outdated
Updated to reflect the new blocking implementation.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@53
PS25, Line 53: //
> now does the new active_ranges_ fit in to the above, if at all?
Extended the comment to better explain.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@54
PS25, Line 54: outgoing queue
> update
Done


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@202
PS25, Line 202: added to the disk
              :   /// queue
> what do we mean by "disk queue"?  I guess the inflight queue (as opposed to
Clarified. It's the 'in_flight_ranges_' queue, but ranges there move to 
'inflight_ranges' automatically.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@209
PS25, Line 209: ScheduleContext() later to schedule the range
> is that the same as the "range is scheduled later " case above? Or is does
Yeah I think this part of the comment was vague at best and maybe just straight 
incorrect. I reworded to more concretely describe what it does in terms of 
queues and provided an example


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@303
PS25, Line 303: may have not yet
              :   /// hit eos or been cancelled
> I don't follow that part.  Is that saying they might or might not have hit
Agree it's redundant with the last paragraph, removed it here.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@457
PS25, Line 457: A ScanRange is added to this queue when it is returned in
              :     /// GetNextRange(), or when it is added with 
schedule_immediately = true.
> update
Done


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.cc
File be/src/runtime/io/request-context.cc:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.cc@154
PS25, Line 154: schedule_mode
> maybe updating the " A scan range for the reader is on one of five states:"
Tried to improve the header comments. Agree that it's confusing, particularly 
the BY_CALLER. BY_CALLER seems like it could be avoided if the accounting was 
done differently.

I also realised that we don't have full test coverage for the case where 
AllocateBuffersForRange() is not called after needs_buffers=true is returned. I 
added a unit test.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.cc@160
PS25, Line 160:     disk_state->set_done(false);
> this is surprising given the comment for done_ in the header
Do you mean because it sort-of implies that it's a terminal state? We could 
maybe rename it to be clearer. E.g. 'inactive_'. Or invert it and make it 
'active_'


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@308
PS25, Line 308:   /// Add buffers for the range to read data into.
> this will also schedule the range if needed to unblock GetNext, right?
Done


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@311
PS25, Line 311: AddBuffers
> how about AddUnusedBuffers() to help distinguish from EnqueueBuffer()?  May
Done


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@323
PS25, Line 323: GetNextBufferForRange
> Would be nice to have "Unused" in the name here too.
Done


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@345
PS25, Line 345: This is invoked by RequestContext::Cancel(), which does the 
removal itself.
> why is that necessary?
RequestContext calls this while iterating over active_ranges_. If we used 
CancelFromReader(), it would call back into RequestContext and remove the entry 
from 'active_ranges_', which I think believe invalidates the iterator.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@437
PS25, Line 437: iomgr
> what's the significance of the iomgr in the name? Oh, I guess because this
Yeah, exactly. I added this to try to disambiguate. LMK if you think it's 
confusing, I'm not in love with the naming here.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/scan-range.cc@95
PS25, Line 95:   reader_->num_used_buffers_.Add(-1);
> do we actually use these counters?  seems like we do an awful lot of bookke
It looks like it's only used for debugging (validation + debug strings). I 
think the DCHECKS in the ScanRange and BufferDescriptor destructors are 
sufficient to detect memory leaks, so I don't see a real benefit.

Would be easy to rip out. I can go ahead and do that if you don't mind the 
churn on this patch or otherwise defer the work.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/scan-range.cc@116
PS25, Line 116:   bool schedule_range = !returned;
> rather than having 'returned', could we just start the scan range with bloc
Done


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/util/bit-util.h
File be/src/util/bit-util.h:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/util/bit-util.h@102
PS25, Line 102: RoundDownToPowerOfTwo
> hmm it's pretty confusing that RoundDownToPowerOfTwo and RoundDownToPowerOf
Yeah I think Round{Up,Down}ToPowerOfTwoFactor would be a more appropriate name 
for the other set of functions. I used that name when adding the equivalent to 
the frontend. Should I file a JIRA to do the cleanup? Didn't want to add more 
code churn to this patch.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/util/runtime-profile-counters.h@65
PS25, Line 65:    } while (false);
> hmm it's really unfortunate that we have these cases. I assume this is beca
The API allows clients to pass in their own counters, so I think some clients 
don't pass in anything. Moved to disk-io-mgr-internal.h. I also skipped 
implementing the #if ENABLE_COUNTERS thing over there since afaik it's totally 
bitrotted anyway.



--
To view, visit http://gerrit.cloudera.org:8080/8707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f
Gerrit-Change-Number: 8707
Gerrit-PatchSet: 25
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Sat, 17 Feb 2018 00:19:55 +0000
Gerrit-HasComments: Yes

Reply via email to