[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode

Alexey Serbin (Code Review) Tue, 30 Aug 2016 19:42:51 -0700

Alexey Serbin has posted comments on this change.

Change subject: KUDU-456 Implement AUTO_FLUSH_BACKGROUND flush mode
......................................................................



Patch Set 23:

(27 comments)

Dan & Adar, thank you for the review.  I'll post updated version as soon as I 
make sure the tests are passing OK.

http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/batcher.cc
File src/kudu/client/batcher.cc:

Line 474:     Status s = had_errors_ ? Status::Incomplete("Some errors 
occurred")
> I don't think Incomplete is the right status here; the flush is complete.
OK, that reasoning makes sense.

However, I thought IOError might be irrelevant here as well.  Or this is the 
best match among all possible status codes?  There is seemingly more generic 
RunTimeError, but I'm not sure that's the right case to use it.

I'll try to return IOError back to get back to the original, and meanwhile we 
can discuss it separately.


http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

PS23, Line 2360: there the current batcher is at place
> Not sure what this is trying to say, maybe:
ops, my bad.

Should have been something like 'OK, now the current batcher should be at 
place.'

Will fix.


http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/client.cc
File src/kudu/client/client.cc:

Line 322:                                            hp);
> this can go on the line above.
Done


http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/session-internal.cc
File src/kudu/client/session-internal.cc:

Line 63:   TimeBasedFlushInit(session);
> How about checking if we are in AUTO_FLUSH_BACKGROUND first?  I know this i
The check you mentioned is done by TimeBasedFlushTask() method _before_ 
scheduling a task (i.e. no task scheduled if current mode is not 
AUTO_FLUSH_BACKGROUND).  So, I see no need to perform that check here.

I put all those ugly guts (synchronization, etc.) into that method with 
intention to avoid doing those checks elsewhere.


Line 63:   TimeBasedFlushInit(session);
> The funny thing is that, by definition, we cannot be in AUTO_FLUSH_BACKGROU
TimeBasedFlushInit() is no-op in other than AUTO_FLUSH_BACKGROUND mode.

Yes, it might be a bad name for the method.

What name would you recommend instead?


PS23, Line 105:    std::lock_guard<simple_spinlock> l(lock_);
> Since this isn't acquired on any hot path, what do you think of removing it
I thought it was better to keep those locks separate to avoid having one giant 
lock for all the methods.  Those locks based on simple_spinlock look lighter 
than based on mutex.  But as you mentioned, since we already have that for the 
most critical paths, and the rest with simple_spinlock are just for methods 
which almost never called, I think it's a good idea.

Will do, thanks!


Line 118:   // Thread-safety note: the buffer_bytes_limit_ is not supposed to 
be accessed
> These thread safety notes (here and below) would be better in the header de
I tried that, but from the reading perspective it looked worse -- there is not 
much of context when looking at the signature of a function/method.  However, 
in the body of a method, there is much more relevant context since you can see 
the logic of the method, more comments, corresponding variables/members, etc.

That's said, I'm hesitant to move those in the header.

However, if you feel strong about that and have a compelling reasoning which 
addresses my concern regarding the lack of context, I'll move those.


Line 156: Status KuduSession::Data::SetMaxBatchersNum(unsigned int max_num) {
> How come this one doesn't check HasPendingOperations?
I didn't see a restriction that would arise from the current design.

Actually, we can remove that for SetBufferFlushInterval() and other except for 
the KuduSession::SetFlushMode().

But as I see from our recent HipChat discussion, we want to keep us more space 
for the future, because once published, it's easier to make those restrictions 
less strict.  But making these more strict could be almost impossible in the 
future.

So, I added the artificial restriction here as well.


Line 181:   FlushCurrentBatcher(1, cb);
> Nit: add a comment explaining the significance of 1 (i.e. why not 0?)
Done


PS23, Line 188: nullptr
> I spent a lot of time wondering if the loss of the synchronizer and RETURN_
I did a research on that prior to using nullptr as a callback in the new code 
and I didn't see any cases where an error would be dropped on the floor (i.e. 
in all cases where Batcher::had_errors_ is set to true, the error is registered 
into the Batcher::error_collector_).  So, it's safe as far as I can see.  But 
I'll add corresponding comments into the batcher and for the 
FlushCurrentBatcher() methods to clarify on that.

I think it's a good idea to document the difference.  I even thought I had done 
it, but re-visiting the client.h comments revealed that I missed that part.  
Thank you for the reminder -- I'll add corresponding comments there.


PS23, Line 283: nil
> NULL
Done


PS23, Line 296: The flush_mode_ is accessed
              :   // from the background time-based flush task, but it accesses 
it
              :   // only for reading.
> TSAN may complain about this. ISTR it doesn't like accesses to the same mem
OK, I see.  I suspected something like that, but since I haven't seen anything 
from TSAN, I though it was happy.

I will protect flush_mode_ as needed.  Thanks!


Line 354:         batch_ctl_cond_.Wait();
> This isn't going to work for ApplyAsync(). Now, we haven't actually impleme
Yep.  For ApplyAsync() the control would go into the next clause instead.  So, 
it's about having something like

if (apply_mode == SYNC && flush_mode == AUTO_FLUSH_BACKGROUND) {
  ...
} else if (...) {
  ...
} 

For the batchers limit it will be the something like

if (apply_mode == ASYNC && !batcher_ && batchers_num_limit_ != 0 && 
batchers_num_ >= bathers_num_limit_) {
  return Status::Aborted("too many batchers");
}

Basically, we are introducing an additional parameter for this method: 
apply_mode.  May be, brushing up the code a little.

Then it should work as expected, right?


PS23, Line 356: }
              :     if (
> should this be an else if?
Yes, indeed.  Good observation!  Somehow I missed this while doing the 
re-structuring.


Line 368:     buffer_bytes_used_ += required_size;
> hmm, this seems a little early.  I can't think of a reason it would matter,
Good catch!

It seems this is a remnant from previous multi-mutex version, where there were 
separate mutexes for batcher and byte counter.


Line 379:       if (!batcher_) {
> Will the batcher_ ever be non-null here?  I can't think of a reason why it 
Another multi-thread remnant -- that were needed when it was possible to have 
multiple threads calling Apply().

Having DCHECK() would not hurt, right.


Line 399:       buffer_bytes_used_ -= required_size;
> Another good reason not to increment this counter until after the op is suc
Done


http://gerrit.cloudera.org:8080/#/c/3952/23/src/kudu/client/session-internal.h
File src/kudu/client/session-internal.h:

PS23, Line 42: It's
> Nit: Its
Done


PS23, Line 43: implemented
> Nit: don't need this word.
Done


PS23, Line 46: Aply
> Apply
Done


PS23, Line 91: finished their flushing
> finish flushing
Done


PS23, Line 91: Return once no more pending operations are left.
> this sentence seems redundant.
Done


PS23, Line 128:   // The self-rescheduling task to flush write operations which 
have been
              :   // accumulating for too long (controlled by flush_interval_).
              :   // This does real work only in case of AUTO_FLUSH_BACKGROUND 
mode.
> Should document what 'do_startup_check' means.
Done


PS23, Line 172: requries
> Nit: "requires"
Done


PS23, Line 172: which
> Nit: "whose"
Done


Line 191:   std::unordered_set<internal::Batcher*> flushed_batchers_;
> Why is it necessary to hold on to these batchers?  I see that it's used for
That's a good question!  Besides using those for current implementation of 
HasPendingOperations() methods, those are also used for independent count of 
bytes of currently pending operations: GetPendingOperationsSize().  I see this 
method is currently used only in tests, so it might be a good idea to resent to 
direct call of the KuduSession::Data internals (as those tests now do anyways), 
and try to get rid of this set of pointers.

May be, it was also good from the debugging perspective?


PS23, Line 199: // protected by batch_ctl_mutex_
> the setter doesn't take batch_ctl_mutex_
Thanks -- it seems this comment is stale and is left from times when 
KuduSession was thread-safe.  Will fix.


-- 
To view, visit http://gerrit.cloudera.org:8080/3952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I34905c30b3aad96f53cf7a1822b1cde6d25f33a8
Gerrit-PatchSet: 23
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: Dinesh Bhat <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-456 Implement AUTO FLUSH BACKGROUND flush mode

Reply via email to