Hello Tidy Bot, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/8395 to look at the new patch set (#3). Change subject: error_manager: synchronize/serialize handling ...................................................................... error_manager: synchronize/serialize handling lowercased wip: testing is done in the form of disk-failure itests that are in-flight; additionally, this doesn't add much given we still crash on disk-failure at the moment. I tried to make a case for why this is necessary here anyway. The state of a tablet server post-disk-failure depends significantly on the completion of disk-failure-handling callbacks. I.e. error handling _must_ finish before anything is propagated back to the offending caller. This is trickier when multiple calls are in flight that may trigger error handling for a single tablet. This patch adds a single-threaded threadpool to the error manager to simplify such interweaved calls: when a disk fails, it submits a callback to the threadpool and waits for all error-handling to finish. Errors that may not themselves trigger handling but may be indirectly caused by disk failures can now wait for handling to complete. As an example of where this is necessary, say a tablet has data in a single directory and hits a bad disk. That directory is immediately marked failed and handling starts to fail all tablets in the directory. Before, if the tablet were to create a new block before being failed, it would fail immediately, complaining that no directories are available, and would eventually fail a CHECK that translates roughly to: "Has error handling for this tablet completed?" With serialized error-handling, this CHECK will pass, as calls that might fail indirectly due to disk failure can now wait for handling to complete. Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e --- M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/error_manager.h M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc 9 files changed, 113 insertions(+), 53 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/8395/3 -- To view, visit http://gerrit.cloudera.org:8080/8395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie61c408a0b4424f933f40a31147568c2f906be0e Gerrit-Change-Number: 8395 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot