Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode ......................................................................
Patch Set 9: (22 comments) http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/batcher.h File src/kudu/client/batcher.h: Line 64: // Takes a reference on error_collector. Creates a weak_ptr to 'session'. > Could you update this comment, especially WRT the error_collector? I can't Done Line 115: static int64_t GetOperationSizeInBuffer(KuduWriteOperation* write_op); > This doesn't seem too useful since it just calls write_op->SizeInBuffer(). I didn't want to introduce an additional KuduSession::Data friend to the Batcher class (and it's not possible because Data is a nested class in the KuduSession class). Probably, there is a more elegant solution for this. Basically, I want to get wire size for KuduWriteOperation. http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: PS9, Line 555: session > the session Done PS9, Line 557: maximum > the maximum Done PS9, Line 557: 10x multiplier > 10x seems like a lot of leeway. Could we get away with 2 or 3x? I would c Done PS9, Line 2221: Status s; > looks like everywhere this is used the call could instead be wrapped in ASS Somehow I overlooked that :) Thanks! PS9, Line 2307: scenrio > 'scenario' here and below Done PS9, Line 2478: previos > previous Done http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 36: DEFINE_int32(client_buffer_bytes_limit, 7 * 1024 * 1024, > Remove this and use SetMutationBufferSpace instead. Done Line 49: DEFINE_int32(client_buffer_flush_watermark_pct, 75, > Instead of a gflag, this should be an option on KuduSession, like SetMutati Done Line 61: DEFINE_int32(client_buffer_flush_timeout_ms, 10, > Same here, this should be an option on KuduSession. https://kudu.apache.or Done PS9, Line 82: : > wrap init list. Done Line 115: messenger->ScheduleOnReactor( > It's not clear to me why this needs to be scheduled on a reactor thread. I Good point! The original design had a separate thread which was busy with all flush-related activity, and I just transferred it on the new scheme with messenger/reactor threads. This should not be blocking, so I can safely call it from the same thread where CheckAndRun is being executed. PS9, Line 154: if (PREDICT_FALSE(!status.ok())) { : return; : } : // Check that the session is still alive to access the data safely. : sp::shared_ptr<KuduSession> session(weak_session.lock()); : if (PREDICT_FALSE(!session)) { : return; : } : // Flush mode could change during the operation. If current mode : // is no longer AUTO_FLUSH_BACKGROUND, then stop re-scheduling the task. : if (PREDICT_FALSE(data_->flush_mode_ != AUTO_FLUSH_BACKGROUND)) { : return; : } : // Detach the batcher, schedule its flushing, : // and install a new one to accumulate incoming write operations. : data_->NextBatcher(weak_session, 0, nullptr); > Looks like this might be able to call Run instead of duplicating? Done PS9, Line 205: MutexLock > prefer to use std::lock_guard if possible. Done PS9, Line 392: int64_t* op_raw_size > is this still necessary now that the size is cached? yep -- that's the issue of visibility of a private method. I cannot make an internal class a friend to the Batcher class. Line 421: flow_control_cond_.Wait(); > Is a flush forced somewhere in this situation, or is it just waiting for th The condition awaits till total size of pending operations drops below the limit. This is enforced by the logic of the code. If no updates happen on total_bytes_used_, this will wait forever. http://gerrit.cloudera.org:8080/#/c/3952/9/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: PS9, Line 111: BACKGFOUND > BACKGROUND Done Line 126: ~BackgroundFlusher(); > I think this can use the default keyword instead of defining the no-op dtor OK, I just removed it to rely on auto-generated ones. Line 136: void CheckAndRun(const std::shared_ptr<rpc::Messenger>& messenger, > Could you doc this? I can't tell exactly what it's for, or why it chooses Renamed and added more comments. PS9, Line 151: KuduSession::Data* const > It looks like all of the methods on the background flusher that do any work It's a reference (a pointer, actually) to the outer/enclosing object. It's possible to keep those weak pointers as members of the BackgroundFlusher, but when a BackgroundFlusher object, as a sub-object of KuduSession::Data, is deallocated on destruction of the enclosing KuduSession::Data object, those members cannot be used anyway. So, I'm putting all those into the parameters of the methods which are being called by the reactor threads. Basically, necessary weak references on involved object are kept in the functor object, which, in its turn, stored by corresponding reactor thread. The BackgroundFlusher is a shim object; it's a leaky abstraction. Actually, I was even thinking of having all of its methods static and passing everything by parameters. But that was too uglier. The destructor ordering thing is stale now. It was actual in other version of this patch when a separate thread did flushing. Will remove the stale comment -- thanks! PS9, Line 159: write operation > the write operation Done -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Dan Burkert <d...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes