Alexey Serbin has posted comments on this change.

Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode
......................................................................


Patch Set 22:

(17 comments)

Thank you for the review.

I'll try to take a fresh look in the context of simplifying the code.

Frankly, it would help a lot if it were clear from the very beginning that the 
KuduSession interface is not supposed to be thread-safe.  De-facto I started 
with the idea to have most of its methods thread-safe (as it was advertised by 
the comments in client.h).

http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/client.h
File src/kudu/client/client.h:

PS22, Line 1037: current
> Nit: "the current"
Done


PS22, Line 1143: when
> Nit: "only when", right?
Done


PS22, Line 1180: called
> Nit: "called the"
Done


PS22, Line 1183: taken care
> Nit: "taken care of"
Done


PS22, Line 1184: Once flushing of the prior
               :   /// mutation buffer is initiated, the new one is created and 
set current
               :   /// to accommodate incoming write operations, limit 
permitting.
> Nit: "Upon flushing the current mutation buffer, a new buffer is created to
Done


Line 1188:   /// The default and the minimum setting for this parameter is 2 
(two).
> Hmm, technically couldn't the minimum be 1? If set to 1, an Apply() with an
Yes, technically the minimum could be 1.

Yes, the behavior of the KuduSession::Apply() you described is there, but it 
now comes into play only when the incoming operation triggers flush of the 
current buffer.

I put 2 to keep current semantics of always having the current batcher 
available for incoming operations and allocating it when starting flush of the 
prior one.

I can modify the code and make it 1.  That would involve checking for the 
current batcher availability at every Apply() call.  Right now it's guaranteed 
the current batcher is always at place.


PS22, Line 1191: the
> Nit: drop 'the'
Done


PS22, Line 1201: implicitly amended
               :   ///   as if it were 0.
> Nit: "implicitly set to 0".
Done


http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/session-internal.cc
File src/kudu/client/session-internal.cc:

PS22, Line 80: the
> Nit: double the
Done


Line 130:   buffer_bytes_limit_ = size;
> Likewise, I don't see why this needs to be protected.
It seems that's a remnant from the older code which was written with 
multi-threading safety requirement in mind.

Thanks, will remove.


Line 145:   buffer_watermark_pct_ = watermark_pct;
> Why does this need to be protected? It's only ever accessed by the writer t
Right.  That's the same as for the KuduSession::SetBufferBytesLimit() -- 
remnant from the past.


PS22, Line 191:   Synchronizer s;
              :   KuduStatusMemberCallback<Synchronizer> ksmcb(&s, 
&Synchronizer::StatusCB);
              :   NextBatcher(session, BATCHER_WATERMARK_NON_EMPTY, &ksmcb, 
BLOCK);
              :   RETURN_NOT_OK(s.Wait());
> Why do we need to use the synchronizer to wait on this batch if, below, we 
NextBatcher() not only flushes the current batcher, but it also allocates a new 
one for next incoming operations.

Probably, it's better to separate those functions, and then it will be possible 
to make 1 the minimum number for the batchers_num_limit_.  I'll do that.

Back to your question.  The NextBatcher() method has two modes of operation: 
blocking and non-blocking.  In non-blocking mode it reports an error if it's 
not possible to allocate a new batcher due to the batchers_num_limit_ setting; 
in blocking mode it blocks until it's possible to allocate a new one and do its 
job.

So, it's necessary to initiate flushing current batcher and then wait for 
total_bytes_used_ to become 0.

BTW, the error collector does not capture errors due to batchers_num_limit_ 
limitations -- those errors are reported  via the provided callback for the 
NextBatcher() (which usually comes from the FlushAsync()).


PS22, Line 222: relevent
> Nit: relevant
Done


http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/session-internal.h
File src/kudu/client/session-internal.h:

PS22, Line 104:   // On successful return, the output flush_mode parameter
              :   // is set to the effective flush mode.
> I thought we no longer needed the output parameter for flush mode?
Right.

The reason of returning the effective flush_mode in output parameter is that I 
don't want to acquire the lock protecting flush_mode_ again just in the 
upper-level KuduSession::Apply() -- that needs to know the flush mode to decide 
whether it's necessary to do Flush() in case of AUTO_FLUSH_SYNC mode.  Strictly 
speaking, it's necessary either to access it under lock or make it atomic.

As for the task cancelation, I don't think it would be a better approach.


PS22, Line 142:   
> Nit: looks like there are two spaces here. Below too.
Done


PS22, Line 147: BATCHER_WATERMARK_NON_EMPTY
> Our convention for naming class constants like these "kCamelCaseFoo". We re
Done


PS22, Line 158:   // A dedicated lock to prevent multiple threads executing the 
ApplyWriteOp()
              :   // methods simultaneously. Should not be used for any other 
purpose.
> I don't understand; we've expressly forbidden multiple Apply() threads. So 
Right.  This is sort of defensive technique.  Will remove.


-- 
To view, visit http://gerrit.cloudera.org:8080/3952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8
Gerrit-PatchSet: 22
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: Dinesh Bhat <din...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to