Todd Lipcon has posted comments on this change.

Change subject: disk failure: add persistent disk states
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7270/15//COMMIT_MSG
Commit Message:

Line 11: failed disk may be partially written and thus should not be used.
I dont quite follow this logic. If we crashed in the middle of writing a block 
(due to an IO error on thedisk or whatever) then the metadata flush which 
points to the half-written block would not have happened yet. Thus, we don't 
have any partially-written data which is referred to by any metadata.

The bigger argument to persisting the failure status seems to be that, if a 
disk failed, it probably is either read-only, unmounted, or likely to continue 
producing IOErrors after a restart, but we'd still like to start up the server, 
right?

One concern about this is that, in the case of a transient fixable error like 
"out of space" it would certainly be nice to be able to restart without losing 
data. (eg consider the case when some rogue job fills up disks on many nodes at 
once, so we have simultaneous crash of several)


-- 
To view, visit http://gerrit.cloudera.org:8080/7270
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to