[Impala-ASF-CR] IMPALA-4835: Part 2: Allocate scan range buffers upfront

Dan Hecht (Code Review) Wed, 14 Feb 2018 22:33:07 -0800

Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8707 )


Change subject: IMPALA-4835: Part 2: Allocate scan range buffers upfront
......................................................................


Patch Set 25:

(33 comments)

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/exec/hdfs-parquet-scanner.cc@1445
PS25, Line 1445: ;
<< "Already provided a buffer" was nice


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h
File be/src/runtime/io/disk-io-mgr.h:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@84
PS25, Line 84: GetNextRange: returns to the caller the next scan range it 
should process.
             : ///     This is based on disk load. This also begins reading the 
data in this scan
             : ///     range. This is blocking.
that could probably use updating? or maybe rephrase to talk about states or 
something rather than API calls.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@109
PS25, Line 109: Memory usage in the IoMgr comes from queued read buffers.  If 
we queue the minimum
              : /// (i.e. 1), then the disks are idle while we are processing 
the buffer. If we don't
              : /// limit the queue, then it possible we end up queueing the 
entire data set (i.e. CPU
              : /// is slower than disks) and run out of memory.
this seems a bit outdated


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@116
PS25, Line 116: In that case each query will need fewer queued buffers and
              : /// therefore have less memory usage.
              : //
and that


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@136
PS25, Line 136: a client buffer provided by the caller
              : /// when constructing the scan range.
in that case, is it required that the single buffer can fit the entire range?


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@147
PS25, Line 147: buffers are owned
how does a buffer become owned by the caller? I guess that means when it's 
returned by GetNext() but not yet ReturnBuffer()?  How does the caller know how 
many buffers there are in play in order to satisfy this requirement?  or does 
it have to just conservatively assume that it should only "own" one buffer at a 
time?

So it seems we effectively still have a client cooperation requirement in the 
new code (which I think is okay). It's just that instead of leading to a 
reservation violation, it would lead to a resource deadlock. I do agree the 
waiting on buffers rather than on queue space is an improvement though.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@149
PS25, Line 149: it can allocate at least three
              : /// max-sized buffers per scan range
the API below (AllocateBuffersForRange()) doesn't seem to give the caller 
(explicit) control over how many buffers are allocated. I guess it's implicit 
because of the maximum IO buffer size?


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@269
PS25, Line 269: ,
"on return"
or "If 'needs_buffers' is returned as true"
to make it clearer this is an out-only param


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@282
PS25, Line 282: range is
              :   /// available
what does it mean for a range to be available (from an API perspective)?


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@288
PS25, Line 288: 'needs_buffers' is set to true
this returns true in '*needs_buffers'


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@300
PS25, Line 300: allocate
allocated


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.h@474
PS25, Line 474:   /// Reads the specified scan range and calls 
HandleReadFinished() when done.
this should probably mention how it might not do anything if there's no buffer 
available


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.cc@375
PS25, Line 375:   // Add the range to the queue of the disk the range is on
that doesn't seem to match the code block


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.cc@387
PS25, Line 387:   // Can't schedule until the caller allocates some buffers for 
the range.
that comment seems to talk about only one case handled here


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/disk-io-mgr.cc@391
PS25, Line 391: *needs_buffers ? ScheduleMode::BY_CALLER
why is this case different than when AddScanRanges() has NO_BUFFER (which does 
UPON_GETNEXT)?  I guess in this case we don't want it to go through the 
next_scan_range_to_start() logic.  But where does ScheduleRange() end up being 
called in this case? Oh, I guess via AddBuffers(), which is like UPON_GETNEXT.  
So, BY_CALLER name seems kind of misleading since UPON_GETNEXT also falls back 
to that case ultimately for scheduling the scan range.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h
File be/src/runtime/io/request-context.h:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@41
PS25, Line 41: unstarted_ranges
not your change but unstarted_scan_ranges_


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@48
PS25, Line 48: ScanRange's outgoing ready buffers is full. We can't read for 
this range
             : ///    anymore. We need the caller to pull a buffer off which 
will put this in
             : ///    the in_flight_ranges queue. These ranges are in the 
RequestContext's
             : ///    blocked_ranges_ queue.
this looks outdated


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@53
PS25, Line 53: //
now does the new active_ranges_ fit in to the above, if at all?


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@54
PS25, Line 54: outgoing queue
update


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@202
PS25, Line 202: added to the disk
              :   /// queue
what do we mean by "disk queue"?  I guess the inflight queue (as opposed to the 
unstarted queue?


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@209
PS25, Line 209: ScheduleContext() later to schedule the range
is that the same as the "range is scheduled later " case above? Or is does 
"scheduled" mean both added to the queue and call ScheduleContext() whereas in 
this case it's added to the queue already and the caller only need to call 
ScheduleContext()?

I guess it's hard to understand what exactly we mean by "schedule" - does it 
always mean both enqueue on the PerDiskState queue and schedule the context, or 
does it mean something different?


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@303
PS25, Line 303: may have not yet
              :   /// hit eos or been cancelled
I don't follow that part.  Is that saying they might or might not have hit eos? 
or that they absolutely haven't hit eos if they are in this set?
Okay, I guess the last paragraph talks about this, but this still statement 
seems confusing.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.h@457
PS25, Line 457: A ScanRange is added to this queue when it is returned in
              :     /// GetNextRange(), or when it is added with 
schedule_immediately = true.
update


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.cc
File be/src/runtime/io/request-context.cc:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.cc@154
PS25, Line 154: schedule_mode
having these three cases is a bit confusing, note to self to think about that 
some more.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-context.cc@160
PS25, Line 160:     disk_state->set_done(false);
this is surprising given the comment for done_ in the header


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@308
PS25, Line 308:   /// Add buffers for the range to read data into.
this will also schedule the range if needed to unblock GetNext, right?


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@311
PS25, Line 311: AddBuffers
how about AddUnusedBuffers() to help distinguish from EnqueueBuffer()?  Maybe 
even rename EnqueueBuffer() to EnqueueReadyBuffer() or AddReadyBuffer() to 
further explain.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@323
PS25, Line 323: GetNextBufferForRange
Would be nice to have "Unused" in the name here too.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@345
PS25, Line 345: This is invoked by RequestContext::Cancel(), which does the 
removal itself.
why is that necessary?


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/request-ranges.h@437
PS25, Line 437: iomgr
what's the significance of the iomgr in the name? Oh, I guess because this is 
only used for "IO-mgr allocated" buffers (implied by NO_BUFFER)?


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/runtime/io/scan-range.cc@95
PS25, Line 95:   reader_->num_used_buffers_.Add(-1);
do we actually use these counters?  seems like we do an awful lot of 
bookkeeping. anyway, cleaning that up is out of scope.


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/util/bit-util.h
File be/src/util/bit-util.h:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/util/bit-util.h@102
PS25, Line 102: RoundDownToPowerOfTwo
hmm it's pretty confusing that RoundDownToPowerOfTwo and RoundDownToPowerOf2 
are different things (though admittedly the existing names seem wrong).


http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/8707/25/be/src/util/runtime-profile-counters.h@65
PS25, Line 65:    } while (false);
hmm it's really unfortunate that we have these cases. I assume this is because 
of tests?  How about defining these local to disk-io-mgr.cc as to not encourage 
this? (and a todo that say we should just make these things are always not 
null).



--
To view, visit http://gerrit.cloudera.org:8080/8707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f
Gerrit-Change-Number: 8707
Gerrit-PatchSet: 25
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 15 Feb 2018 06:31:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4835: Part 2: Allocate scan range buffers upfront

Reply via email to