[kudu-CR] disk failure: don't fail CHECKs for disk failures

Mike Percy (Code Review) Mon, 17 Jul 2017 17:55:16 -0700

Mike Percy has posted comments on this change.

Change subject: disk failure: don't fail CHECKs for disk failures
......................................................................



Patch Set 2:

(2 comments)

Like I alluded in some comments, I think this patch needs to be split out into 
one patch per change. Each removal of a CHECK should include tests for all of 
the new cases that used to be impossible to hit without a crash.

I also think it would be good to have an overall error handling design 
documented, either in a doc or a header file or something, so that we have a 
base layer of error handling set up and then systematically implement the new 
error handling in each crash case until we don't crash anymore and we know we 
safely handle each former crash case.

http://gerrit.cloudera.org:8080/#/c/7442/2/src/kudu/tablet/tablet_replica_mm_ops.cc
File src/kudu/tablet/tablet_replica_mm_ops.cc:

Line 32: DECLARE_bool(suicide_on_eio);
unused?


Line 137:     LOG(ERROR) << Substitute("FlushMRS failed on $0", 
tablet_replica_->tablet_id());
Hrm. Is it safe to change a CHECK() to a LOG(ERROR) here? I would like to see 
tests for all the new possible error cases we are unwrapping here.

Diving into FlushUnlocked(), it looks like we create a new MRS and then we call 
FlushInternal(). Can we fail to create a new MRS? In FlushInternal(), there is 
a CHECK based on state_ and I am not sure how that is enforced.

I think this change by itself deserved its own patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/7442
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I109635a54268b9db741b2ae9ea3e9f1fe072d0a8
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

[kudu-CR] disk failure: don't fail CHECKs for disk failures

Reply via email to