Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8707 )

Change subject: IMPALA-4835: Part 2: Allocate scan range buffers upfront

Patch Set 25:


File be/src/exec/hdfs-parquet-scanner.cc:

PS25, Line 1445: ;
<< "Already provided a buffer" was nice

File be/src/runtime/io/disk-io-mgr.h:

PS25, Line 84: GetNextRange: returns to the caller the next scan range it 
should process.
             : ///     This is based on disk load. This also begins reading the 
data in this scan
             : ///     range. This is blocking.
that could probably use updating? or maybe rephrase to talk about states or 
something rather than API calls.

PS25, Line 109: Memory usage in the IoMgr comes from queued read buffers.  If 
we queue the minimum
              : /// (i.e. 1), then the disks are idle while we are processing 
the buffer. If we don't
              : /// limit the queue, then it possible we end up queueing the 
entire data set (i.e. CPU
              : /// is slower than disks) and run out of memory.
this seems a bit outdated

PS25, Line 116: In that case each query will need fewer queued buffers and
              : /// therefore have less memory usage.
              : //
and that

PS25, Line 136: a client buffer provided by the caller
              : /// when constructing the scan range.
in that case, is it required that the single buffer can fit the entire range?

PS25, Line 147: buffers are owned
how does a buffer become owned by the caller? I guess that means when it's 
returned by GetNext() but not yet ReturnBuffer()?  How does the caller know how 
many buffers there are in play in order to satisfy this requirement?  or does 
it have to just conservatively assume that it should only "own" one buffer at a 

So it seems we effectively still have a client cooperation requirement in the 
new code (which I think is okay). It's just that instead of leading to a 
reservation violation, it would lead to a resource deadlock. I do agree the 
waiting on buffers rather than on queue space is an improvement though.

PS25, Line 149: it can allocate at least three
              : /// max-sized buffers per scan range
the API below (AllocateBuffersForRange()) doesn't seem to give the caller 
(explicit) control over how many buffers are allocated. I guess it's implicit 
because of the maximum IO buffer size?

PS25, Line 269: ,
"on return"
or "If 'needs_buffers' is returned as true"
to make it clearer this is an out-only param

PS25, Line 282: range is
              :   /// available
what does it mean for a range to be available (from an API perspective)?

PS25, Line 288: 'needs_buffers' is set to true
this returns true in '*needs_buffers'

PS25, Line 300: allocate

PS25, Line 474:   /// Reads the specified scan range and calls 
HandleReadFinished() when done.
this should probably mention how it might not do anything if there's no buffer 

File be/src/runtime/io/disk-io-mgr.cc:

PS25, Line 375:   // Add the range to the queue of the disk the range is on
that doesn't seem to match the code block

PS25, Line 387:   // Can't schedule until the caller allocates some buffers for 
the range.
that comment seems to talk about only one case handled here

PS25, Line 391: *needs_buffers ? ScheduleMode::BY_CALLER
why is this case different than when AddScanRanges() has NO_BUFFER (which does 
UPON_GETNEXT)?  I guess in this case we don't want it to go through the 
next_scan_range_to_start() logic.  But where does ScheduleRange() end up being 
called in this case? Oh, I guess via AddBuffers(), which is like UPON_GETNEXT.  
So, BY_CALLER name seems kind of misleading since UPON_GETNEXT also falls back 
to that case ultimately for scheduling the scan range.

File be/src/runtime/io/request-context.h:

PS25, Line 41: unstarted_ranges
not your change but unstarted_scan_ranges_

PS25, Line 48: ScanRange's outgoing ready buffers is full. We can't read for 
this range
             : ///    anymore. We need the caller to pull a buffer off which 
will put this in
             : ///    the in_flight_ranges queue. These ranges are in the 
             : ///    blocked_ranges_ queue.
this looks outdated

PS25, Line 53: //
now does the new active_ranges_ fit in to the above, if at all?

PS25, Line 54: outgoing queue

PS25, Line 202: added to the disk
              :   /// queue
what do we mean by "disk queue"?  I guess the inflight queue (as opposed to the 
unstarted queue?

PS25, Line 209: ScheduleContext() later to schedule the range
is that the same as the "range is scheduled later " case above? Or is does 
"scheduled" mean both added to the queue and call ScheduleContext() whereas in 
this case it's added to the queue already and the caller only need to call 

I guess it's hard to understand what exactly we mean by "schedule" - does it 
always mean both enqueue on the PerDiskState queue and schedule the context, or 
does it mean something different?

PS25, Line 303: may have not yet
              :   /// hit eos or been cancelled
I don't follow that part.  Is that saying they might or might not have hit eos? 
or that they absolutely haven't hit eos if they are in this set?
Okay, I guess the last paragraph talks about this, but this still statement 
seems confusing.

PS25, Line 457: A ScanRange is added to this queue when it is returned in
              :     /// GetNextRange(), or when it is added with 
schedule_immediately = true.

File be/src/runtime/io/request-context.cc:

PS25, Line 154: schedule_mode
having these three cases is a bit confusing, note to self to think about that 
some more.

PS25, Line 160:     disk_state->set_done(false);
this is surprising given the comment for done_ in the header

File be/src/runtime/io/request-ranges.h:

PS25, Line 308:   /// Add buffers for the range to read data into.
this will also schedule the range if needed to unblock GetNext, right?

PS25, Line 311: AddBuffers
how about AddUnusedBuffers() to help distinguish from EnqueueBuffer()?  Maybe 
even rename EnqueueBuffer() to EnqueueReadyBuffer() or AddReadyBuffer() to 
further explain.

PS25, Line 323: GetNextBufferForRange
Would be nice to have "Unused" in the name here too.

PS25, Line 345: This is invoked by RequestContext::Cancel(), which does the 
removal itself.
why is that necessary?

PS25, Line 437: iomgr
what's the significance of the iomgr in the name? Oh, I guess because this is 
only used for "IO-mgr allocated" buffers (implied by NO_BUFFER)?

File be/src/runtime/io/scan-range.cc:

PS25, Line 95:   reader_->num_used_buffers_.Add(-1);
do we actually use these counters?  seems like we do an awful lot of 
bookkeeping. anyway, cleaning that up is out of scope.

File be/src/util/bit-util.h:

PS25, Line 102: RoundDownToPowerOfTwo
hmm it's pretty confusing that RoundDownToPowerOfTwo and RoundDownToPowerOf2 
are different things (though admittedly the existing names seem wrong).

File be/src/util/runtime-profile-counters.h:

PS25, Line 65:    } while (false);
hmm it's really unfortunate that we have these cases. I assume this is because 
of tests?  How about defining these local to disk-io-mgr.cc as to not encourage 
this? (and a todo that say we should just make these things are always not 

To view, visit http://gerrit.cloudera.org:8080/8707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f
Gerrit-Change-Number: 8707
Gerrit-PatchSet: 25
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Thu, 15 Feb 2018 06:31:55 +0000
Gerrit-HasComments: Yes

Reply via email to