Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode ......................................................................
Patch Set 22: (17 comments) Thank you for the review. I'll try to take a fresh look in the context of simplifying the code. Frankly, it would help a lot if it were clear from the very beginning that the KuduSession interface is not supposed to be thread-safe. De-facto I started with the idea to have most of its methods thread-safe (as it was advertised by the comments in client.h). http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/client.h File src/kudu/client/client.h: PS22, Line 1037: current > Nit: "the current" Done PS22, Line 1143: when > Nit: "only when", right? Done PS22, Line 1180: called > Nit: "called the" Done PS22, Line 1183: taken care > Nit: "taken care of" Done PS22, Line 1184: Once flushing of the prior : /// mutation buffer is initiated, the new one is created and set current : /// to accommodate incoming write operations, limit permitting. > Nit: "Upon flushing the current mutation buffer, a new buffer is created to Done Line 1188: /// The default and the minimum setting for this parameter is 2 (two). > Hmm, technically couldn't the minimum be 1? If set to 1, an Apply() with an Yes, technically the minimum could be 1. Yes, the behavior of the KuduSession::Apply() you described is there, but it now comes into play only when the incoming operation triggers flush of the current buffer. I put 2 to keep current semantics of always having the current batcher available for incoming operations and allocating it when starting flush of the prior one. I can modify the code and make it 1. That would involve checking for the current batcher availability at every Apply() call. Right now it's guaranteed the current batcher is always at place. PS22, Line 1191: the > Nit: drop 'the' Done PS22, Line 1201: implicitly amended : /// as if it were 0. > Nit: "implicitly set to 0". Done http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: PS22, Line 80: the > Nit: double the Done Line 130: buffer_bytes_limit_ = size; > Likewise, I don't see why this needs to be protected. It seems that's a remnant from the older code which was written with multi-threading safety requirement in mind. Thanks, will remove. Line 145: buffer_watermark_pct_ = watermark_pct; > Why does this need to be protected? It's only ever accessed by the writer t Right. That's the same as for the KuduSession::SetBufferBytesLimit() -- remnant from the past. PS22, Line 191: Synchronizer s; : KuduStatusMemberCallback<Synchronizer> ksmcb(&s, &Synchronizer::StatusCB); : NextBatcher(session, BATCHER_WATERMARK_NON_EMPTY, &ksmcb, BLOCK); : RETURN_NOT_OK(s.Wait()); > Why do we need to use the synchronizer to wait on this batch if, below, we NextBatcher() not only flushes the current batcher, but it also allocates a new one for next incoming operations. Probably, it's better to separate those functions, and then it will be possible to make 1 the minimum number for the batchers_num_limit_. I'll do that. Back to your question. The NextBatcher() method has two modes of operation: blocking and non-blocking. In non-blocking mode it reports an error if it's not possible to allocate a new batcher due to the batchers_num_limit_ setting; in blocking mode it blocks until it's possible to allocate a new one and do its job. So, it's necessary to initiate flushing current batcher and then wait for total_bytes_used_ to become 0. BTW, the error collector does not capture errors due to batchers_num_limit_ limitations -- those errors are reported via the provided callback for the NextBatcher() (which usually comes from the FlushAsync()). PS22, Line 222: relevent > Nit: relevant Done http://gerrit.cloudera.org:8080/#/c/3952/22/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: PS22, Line 104: // On successful return, the output flush_mode parameter : // is set to the effective flush mode. > I thought we no longer needed the output parameter for flush mode? Right. The reason of returning the effective flush_mode in output parameter is that I don't want to acquire the lock protecting flush_mode_ again just in the upper-level KuduSession::Apply() -- that needs to know the flush mode to decide whether it's necessary to do Flush() in case of AUTO_FLUSH_SYNC mode. Strictly speaking, it's necessary either to access it under lock or make it atomic. As for the task cancelation, I don't think it would be a better approach. PS22, Line 142: > Nit: looks like there are two spaces here. Below too. Done PS22, Line 147: BATCHER_WATERMARK_NON_EMPTY > Our convention for naming class constants like these "kCamelCaseFoo". We re Done PS22, Line 158: // A dedicated lock to prevent multiple threads executing the ApplyWriteOp() : // methods simultaneously. Should not be used for any other purpose. > I don't understand; we've expressly forbidden multiple Apply() threads. So Right. This is sort of defensive technique. Will remove. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Dan Burkert <d...@cloudera.com> Gerrit-Reviewer: Dinesh Bhat <din...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes