Alexey Serbin has posted comments on this change. Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode ......................................................................
Patch Set 23: (27 comments) Dan & Adar, thank you for the review. I'll post updated version as soon as I make sure the tests are passing OK. http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: Line 474: Status s = had_errors_ ? Status::Incomplete("Some errors occurred") > I don't think Incomplete is the right status here; the flush is complete. OK, that reasoning makes sense. However, I thought IOError might be irrelevant here as well. Or this is the best match among all possible status codes? There is seemingly more generic RunTimeError, but I'm not sure that's the right case to use it. I'll try to return IOError back to get back to the original, and meanwhile we can discuss it separately. http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: PS23, Line 2360: there the current batcher is at place > Not sure what this is trying to say, maybe: ops, my bad. Should have been something like 'OK, now the current batcher should be at place.' Will fix. http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/client.cc File src/kudu/client/client.cc: Line 322: hp); > this can go on the line above. Done http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/session-internal.cc File src/kudu/client/session-internal.cc: Line 63: TimeBasedFlushInit(session); > How about checking if we are in AUTO_FLUSH_BACKGROUND first? I know this i The check you mentioned is done by TimeBasedFlushTask() method _before_ scheduling a task (i.e. no task scheduled if current mode is not AUTO_FLUSH_BACKGROUND). So, I see no need to perform that check here. I put all those ugly guts (synchronization, etc.) into that method with intention to avoid doing those checks elsewhere. Line 63: TimeBasedFlushInit(session); > The funny thing is that, by definition, we cannot be in AUTO_FLUSH_BACKGROU TimeBasedFlushInit() is no-op in other than AUTO_FLUSH_BACKGROUND mode. Yes, it might be a bad name for the method. What name would you recommend instead? PS23, Line 105: std::lock_guard<simple_spinlock> l(lock_); > Since this isn't acquired on any hot path, what do you think of removing it I thought it was better to keep those locks separate to avoid having one giant lock for all the methods. Those locks based on simple_spinlock look lighter than based on mutex. But as you mentioned, since we already have that for the most critical paths, and the rest with simple_spinlock are just for methods which almost never called, I think it's a good idea. Will do, thanks! Line 118: // Thread-safety note: the buffer_bytes_limit_ is not supposed to be accessed > These thread safety notes (here and below) would be better in the header de I tried that, but from the reading perspective it looked worse -- there is not much of context when looking at the signature of a function/method. However, in the body of a method, there is much more relevant context since you can see the logic of the method, more comments, corresponding variables/members, etc. That's said, I'm hesitant to move those in the header. However, if you feel strong about that and have a compelling reasoning which addresses my concern regarding the lack of context, I'll move those. Line 156: Status KuduSession::Data::SetMaxBatchersNum(unsigned int max_num) { > How come this one doesn't check HasPendingOperations? I didn't see a restriction that would arise from the current design. Actually, we can remove that for SetBufferFlushInterval() and other except for the KuduSession::SetFlushMode(). But as I see from our recent HipChat discussion, we want to keep us more space for the future, because once published, it's easier to make those restrictions less strict. But making these more strict could be almost impossible in the future. So, I added the artificial restriction here as well. Line 181: FlushCurrentBatcher(1, cb); > Nit: add a comment explaining the significance of 1 (i.e. why not 0?) Done PS23, Line 188: nullptr > I spent a lot of time wondering if the loss of the synchronizer and RETURN_ I did a research on that prior to using nullptr as a callback in the new code and I didn't see any cases where an error would be dropped on the floor (i.e. in all cases where Batcher::had_errors_ is set to true, the error is registered into the Batcher::error_collector_). So, it's safe as far as I can see. But I'll add corresponding comments into the batcher and for the FlushCurrentBatcher() methods to clarify on that. I think it's a good idea to document the difference. I even thought I had done it, but re-visiting the client.h comments revealed that I missed that part. Thank you for the reminder -- I'll add corresponding comments there. PS23, Line 283: nil > NULL Done PS23, Line 296: The flush_mode_ is accessed : // from the background time-based flush task, but it accesses it : // only for reading. > TSAN may complain about this. ISTR it doesn't like accesses to the same mem OK, I see. I suspected something like that, but since I haven't seen anything from TSAN, I though it was happy. I will protect flush_mode_ as needed. Thanks! Line 354: batch_ctl_cond_.Wait(); > This isn't going to work for ApplyAsync(). Now, we haven't actually impleme Yep. For ApplyAsync() the control would go into the next clause instead. So, it's about having something like if (apply_mode == SYNC && flush_mode == AUTO_FLUSH_BACKGROUND) { ... } else if (...) { ... } For the batchers limit it will be the something like if (apply_mode == ASYNC && !batcher_ && batchers_num_limit_ != 0 && batchers_num_ >= bathers_num_limit_) { return Status::Aborted("too many batchers"); } Basically, we are introducing an additional parameter for this method: apply_mode. May be, brushing up the code a little. Then it should work as expected, right? PS23, Line 356: } : if ( > should this be an else if? Yes, indeed. Good observation! Somehow I missed this while doing the re-structuring. Line 368: buffer_bytes_used_ += required_size; > hmm, this seems a little early. I can't think of a reason it would matter, Good catch! It seems this is a remnant from previous multi-mutex version, where there were separate mutexes for batcher and byte counter. Line 379: if (!batcher_) { > Will the batcher_ ever be non-null here? I can't think of a reason why it Another multi-thread remnant -- that were needed when it was possible to have multiple threads calling Apply(). Having DCHECK() would not hurt, right. Line 399: buffer_bytes_used_ -= required_size; > Another good reason not to increment this counter until after the op is suc Done http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/session-internal.h File src/kudu/client/session-internal.h: PS23, Line 42: It's > Nit: Its Done PS23, Line 43: implemented > Nit: don't need this word. Done PS23, Line 46: Aply > Apply Done PS23, Line 91: finished their flushing > finish flushing Done PS23, Line 91: Return once no more pending operations are left. > this sentence seems redundant. Done PS23, Line 128: // The self-rescheduling task to flush write operations which have been : // accumulating for too long (controlled by flush_interval_). : // This does real work only in case of AUTO_FLUSH_BACKGROUND mode. > Should document what 'do_startup_check' means. Done PS23, Line 172: requries > Nit: "requires" Done PS23, Line 172: which > Nit: "whose" Done Line 191: std::unordered_set<internal::Batcher*> flushed_batchers_; > Why is it necessary to hold on to these batchers? I see that it's used for That's a good question! Besides using those for current implementation of HasPendingOperations() methods, those are also used for independent count of bytes of currently pending operations: GetPendingOperationsSize(). I see this method is currently used only in tests, so it might be a good idea to resent to direct call of the KuduSession::Data internals (as those tests now do anyways), and try to get rid of this set of pointers. May be, it was also good from the debugging perspective? PS23, Line 199: // protected by batch_ctl_mutex_ > the setter doesn't take batch_ctl_mutex_ Thanks -- it seems this comment is stale and is left from times when KuduSession was thread-safe. Will fix. -- To view, visit http://gerrit.cloudera.org:8080/3952 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8 Gerrit-PatchSet: 23 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Dan Burkert <d...@cloudera.com> Gerrit-Reviewer: Dinesh Bhat <din...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes