Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8395 )

Change subject: error_manager: synchronize/serialize handling
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8395/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8395/3//COMMIT_MSG@13
PS3, Line 13:
            : The state of a tablet server post-disk-failure depends 
significantly on
            : the completion of disk-failure-handling callbacks. I.e. error 
handling
            : _must_ finish before anything is propagated back to the offending 
caller.
            : This is trickier when multiple calls are in flight that may 
trigger
            : error handling for a single tablet.
Andrew and I discussed this at length; let me try to summarize what we talked 
about.

The error notification callback alluded to isn't in this patch, or any other 
published patch yet. It's in another patch that Andrew is working on. Roughly 
speaking, here's what it does:

  1. Mark the data dir as failed.
  2. Get all the tablets using that data dir.
  3. For each tablet:
  3a. Stop the tablet.
  3b. Cancel its outstanding maintenance ops
  3c. Asynchronously stop its replica.

Also importantly, some disk failure paths invoke this callback, while others 
(e.g. all data directories have failed) do not; they just return the error up 
the stack.

The invariant Andrew is alluding to is a new one: an MM op that fails due to a 
disk failure will now CHECK that its tablet has been stopped. So here's the 
issue:

  1. There are two outstanding MM ops for a given tablet with just one data dir.
  2. T1 encounters a disk failure and begins running the error notification 
callback. It gets as far as marking the data dir as failed.
  3. T2 tries to get a new block and gets an "all data directories have failed" 
error.
  4. T2 finishes its op with this error before T1 makes more progress.
  5. T2 fails the CHECK because T1 has yet to stop the tablet.

I found the indirect "wait for an error notification callback" approach to be 
confusing, and suggested an alternative: errors like "all data directories have 
failed" should also trigger an error notification callback, a different kind 
that provides the associated tablet ID instead of the data directory. The error 
manager should synchronize this callback type the same way as the existing 
type: by acquiring a mutex while running the callback.

What I like about this alternative is that, from the perspective of the block 
managers it's business as usual: there was an IOError and I need to run a 
notification callback. There's no need to choose between running a callback and 
waiting.


http://gerrit.cloudera.org:8080/#/c/8395/3/src/kudu/fs/error_manager.h
File src/kudu/fs/error_manager.h:

http://gerrit.cloudera.org:8080/#/c/8395/3/src/kudu/fs/error_manager.h@84
PS3, Line 84: class FsErrorManager {
Could you start a error_manager.cc file and move most (if not all?) of the 
implementations there? AFAICT none of the methods here are performance critical 
(and thus deserving of inlining), and we could forward declare more (and 
include less) in the header if the implementations were hidden.


http://gerrit.cloudera.org:8080/#/c/8395/3/src/kudu/fs/error_manager.h@136
PS3, Line 136:    // Protects access to the threadpool.
Not necessary; ThreadPool is already fully thread safe.



--
To view, visit http://gerrit.cloudera.org:8080/8395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e
Gerrit-Change-Number: 8395
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Thu, 26 Oct 2017 23:54:55 +0000
Gerrit-HasComments: Yes

Reply via email to