Alexey Serbin has posted comments on this change.
Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode
Patch Set 22:
Thank you for the review.
I'll try to take a fresh look in the context of simplifying the code.
Frankly, it would help a lot if it were clear from the very beginning that the
KuduSession interface is not supposed to be thread-safe. De-facto I started
with the idea to have most of its methods thread-safe (as it was advertised by
the comments in client.h).
PS22, Line 1037: current
> Nit: "the current"
PS22, Line 1143: when
> Nit: "only when", right?
PS22, Line 1180: called
> Nit: "called the"
PS22, Line 1183: taken care
> Nit: "taken care of"
PS22, Line 1184: Once flushing of the prior
: /// mutation buffer is initiated, the new one is created and
: /// to accommodate incoming write operations, limit
> Nit: "Upon flushing the current mutation buffer, a new buffer is created to
Line 1188: /// The default and the minimum setting for this parameter is 2
> Hmm, technically couldn't the minimum be 1? If set to 1, an Apply() with an
Yes, technically the minimum could be 1.
Yes, the behavior of the KuduSession::Apply() you described is there, but it
now comes into play only when the incoming operation triggers flush of the
I put 2 to keep current semantics of always having the current batcher
available for incoming operations and allocating it when starting flush of the
I can modify the code and make it 1. That would involve checking for the
current batcher availability at every Apply() call. Right now it's guaranteed
the current batcher is always at place.
PS22, Line 1191: the
> Nit: drop 'the'
PS22, Line 1201: implicitly amended
: /// as if it were 0.
> Nit: "implicitly set to 0".
PS22, Line 80: the
> Nit: double the
Line 130: buffer_bytes_limit_ = size;
> Likewise, I don't see why this needs to be protected.
It seems that's a remnant from the older code which was written with
multi-threading safety requirement in mind.
Thanks, will remove.
Line 145: buffer_watermark_pct_ = watermark_pct;
> Why does this need to be protected? It's only ever accessed by the writer t
Right. That's the same as for the KuduSession::SetBufferBytesLimit() --
remnant from the past.
PS22, Line 191: Synchronizer s;
: KuduStatusMemberCallback<Synchronizer> ksmcb(&s,
: NextBatcher(session, BATCHER_WATERMARK_NON_EMPTY, &ksmcb,
> Why do we need to use the synchronizer to wait on this batch if, below, we
NextBatcher() not only flushes the current batcher, but it also allocates a new
one for next incoming operations.
Probably, it's better to separate those functions, and then it will be possible
to make 1 the minimum number for the batchers_num_limit_. I'll do that.
Back to your question. The NextBatcher() method has two modes of operation:
blocking and non-blocking. In non-blocking mode it reports an error if it's
not possible to allocate a new batcher due to the batchers_num_limit_ setting;
in blocking mode it blocks until it's possible to allocate a new one and do its
So, it's necessary to initiate flushing current batcher and then wait for
total_bytes_used_ to become 0.
BTW, the error collector does not capture errors due to batchers_num_limit_
limitations -- those errors are reported via the provided callback for the
NextBatcher() (which usually comes from the FlushAsync()).
PS22, Line 222: relevent
> Nit: relevant
PS22, Line 104: // On successful return, the output flush_mode parameter
: // is set to the effective flush mode.
> I thought we no longer needed the output parameter for flush mode?
The reason of returning the effective flush_mode in output parameter is that I
don't want to acquire the lock protecting flush_mode_ again just in the
upper-level KuduSession::Apply() -- that needs to know the flush mode to decide
whether it's necessary to do Flush() in case of AUTO_FLUSH_SYNC mode. Strictly
speaking, it's necessary either to access it under lock or make it atomic.
As for the task cancelation, I don't think it would be a better approach.
PS22, Line 142:
> Nit: looks like there are two spaces here. Below too.
PS22, Line 147: BATCHER_WATERMARK_NON_EMPTY
> Our convention for naming class constants like these "kCamelCaseFoo". We re
PS22, Line 158: // A dedicated lock to prevent multiple threads executing the
: // methods simultaneously. Should not be used for any other
> I don't understand; we've expressly forbidden multiple Apply() threads. So
Right. This is sort of defensive technique. Will remove.
To view, visit http://gerrit.cloudera.org:8080/3952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: Dinesh Bhat <din...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>