Dan Burkert has posted comments on this change.

Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode
......................................................................


Patch Set 9:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/batcher.h
File src/kudu/client/batcher.h:

Line 64:   // Takes a reference on error_collector. Creates a weak_ptr to 
'session'.
Could you update this comment, especially WRT the error_collector?  I can't 
tell if the intent is that this constructor takes ownership of the error 
collector, or if it's borrowed.


Line 115:   static int64_t GetOperationSizeInBuffer(KuduWriteOperation* 
write_op);
This doesn't seem too useful since it just calls write_op->SizeInBuffer().  Any 
reason to prefer having it on the batcher?


http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

PS9, Line 555: session
the session


PS9, Line 557: 10x multiplier
10x seems like a lot of leeway.  Could we get away with 2 or 3x?  I would 
consider it a bug if we routinely took 10x longer than the deadline.


PS9, Line 557: maximum 
the maximum


PS9, Line 2221: Status s;
looks like everywhere this is used the call could instead be wrapped in 
ASSERT_OK.  I think that also tends to give better error messages on failure.


PS9, Line 2307: scenrio
'scenario' here and below


PS9, Line 2478: previos
previous


http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/session-internal.cc
File src/kudu/client/session-internal.cc:

Line 36: DEFINE_int32(client_buffer_bytes_limit, 7 * 1024 * 1024,
Remove this and use SetMutationBufferSpace instead.


Line 49: DEFINE_int32(client_buffer_flush_watermark_pct, 75,
Instead of a gflag, this should be an option on KuduSession, like 
SetMutationBufferSpace.  This also mirrors the equivalent Java options: 
https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setMutationBufferLowWatermark-float-


Line 61: DEFINE_int32(client_buffer_flush_timeout_ms, 10,
Same here, this should be an option on KuduSession.  
https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setFlushInterval-int-


PS9, Line 82: :
wrap init list.


Line 115:       messenger->ScheduleOnReactor(
It's not clear to me why this needs to be scheduled on a reactor thread.  I get 
that you don't want to block the user thread on the apply call, but if 
BackgroundFlusher::Run blocks, it's not appropriate to be running on the 
reactor thread anyway.


PS9, Line 154:   if (PREDICT_FALSE(!status.ok())) {
             :     return;
             :   }
             :   // Check that the session is still alive to access the data 
safely.
             :   sp::shared_ptr<KuduSession> session(weak_session.lock());
             :   if (PREDICT_FALSE(!session)) {
             :     return;
             :   }
             :   // Flush mode could change during the operation.  If current 
mode
             :   // is no longer AUTO_FLUSH_BACKGROUND, then stop re-scheduling 
the task.
             :   if (PREDICT_FALSE(data_->flush_mode_ != 
AUTO_FLUSH_BACKGROUND)) {
             :     return;
             :   }
             :   // Detach the batcher, schedule its flushing,
             :   // and install a new one to accumulate incoming write 
operations.
             :   data_->NextBatcher(weak_session, 0, nullptr);
Looks like this might be able to call Run instead of duplicating?


PS9, Line 205: MutexLock
prefer to use std::lock_guard if possible.


PS9, Line 392: int64_t* op_raw_size
is this still necessary now that the size is cached?


Line 421:       flow_control_cond_.Wait();
Is a flush forced somewhere in this situation, or is it just waiting for the 
periodic flush?


http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/session-internal.h
File src/kudu/client/session-internal.h:

PS9, Line 111: BACKGFOUND
BACKGROUND


Line 126:     ~BackgroundFlusher();
I think this can use the default keyword instead of defining the no-op dtor in 
session-internal.cc.


Line 136:     void CheckAndRun(const std::shared_ptr<rpc::Messenger>& messenger,
Could you doc this?  I can't tell exactly what it's for, or why it chooses to 
run the flush on the reactor thread with no wait.


PS9, Line 151: KuduSession::Data* const
It looks like all of the methods on the background flusher that do any work 
take a weak_ptr.  I can't quite follow why this raw ptr is held, and the 
methods take the weak_ptr passed in.  Could a weak_ptr instead be passed in as 
part of the ctor and stored in a field, and then the methods above could use 
that?  I think that would also remove the dtor ordering thing below which is 
pretty fragile, since weak_ptrs can be checked before use.


PS9, Line 159: write operation
the write operation


-- 
To view, visit http://gerrit.cloudera.org:8080/3952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to