Andrew Wong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8395


Change subject: error_manager: synchronize/serialize handling
......................................................................

error_manager: synchronize/serialize handling

lowercased wip: testing is done in the form of disk-failure itests that
are in-flight; additionally, this doesn't add much given we still crash
on disk-failure at the moment. I tried to make a case for why this is
necessary here anyway.

The state of a tablet server post-disk-failure depends significantly on
the completion of disk-failure-handling callbacks. I.e. error handling
_must_ finish before anything is propagated back to the offending caller.
This is trickier when multiple calls are in flight that may trigger
error handling for a single tablet.

This patch adds a single-threaded threadpool to the error manager to
simplify such interweaved calls: when a disk fails, it submits a
callback to the threadpool and waits for all error-handling to finish.
Errors that may not themselves trigger handling but may be indirectly
caused by disk failures can now wait for handling to complete.

As an example of where this is necessary, say a tablet has data in a
single directory and hits a bad disk. That directory is immediately
marked failed and handling starts to fail all tablets in the directory.
Before, if the tablet were to create a new block before being failed, it
would fail immediately, complaining that no directories are available,
and would eventually fail a CHECK that translates roughly to: "Has error
handling for this tablet completed?"

With serialized error-handling, this CHECK will pass, as calls that
might fail indirectly due to disk failure can now wait for handling to
complete.

Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e
---
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/data_dirs.h
M src/kudu/fs/error_manager.h
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
5 files changed, 57 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/8395/1
--
To view, visit http://gerrit.cloudera.org:8080/8395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e
Gerrit-Change-Number: 8395
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

Reply via email to