Mike Percy has posted comments on this change. Change subject: disk failure: don't fail CHECKs for disk failures ......................................................................
Patch Set 2: (4 comments) I didn't finish getting through this whole patch yet but here are some high level comments. I feel like I'm little out of the loop. Is there a design doc or a header comment or something to summarize how disk error detection and handling works? Currently, there are cases that we didn't handle to be "safe" without certain durability guarantees. So it needs to be a little clearer how we are handling those, and I guess it either needs to be explicit error handling in each case (rollback, etc) or a way to mark the tablet as permanently failed / dead without affecting the rest of the server. http://gerrit.cloudera.org:8080/#/c/7442/2/src/kudu/tablet/delta_tracker.cc File src/kudu/tablet/delta_tracker.cc: Line 696: // TODO(unknown): need to figure out what to do with error handling if this This should be TODO(todd). Keeping this TODO makes me nervous. Is this handled or not? :) http://gerrit.cloudera.org:8080/#/c/7442/2/src/kudu/tablet/local_tablet_writer.h File src/kudu/tablet/local_tablet_writer.h: Line 104: if (tablet_->IsDataInFailedDir()) { This sort of feels out of place. See other comments http://gerrit.cloudera.org:8080/#/c/7442/2/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: Line 803: void Tablet::ApplyRowOperations(WriteTransactionState* tx_state) { Seems like this should return Status now? http://gerrit.cloudera.org:8080/#/c/7442/2/src/kudu/tablet/tablet.h File src/kudu/tablet/tablet.h: Line 157: void ApplyRowOperations(WriteTransactionState* tx_state); Seems like this has a new (sort of awkward) contract now since you have to check some flags after it returns. That should at least be documented but more likely I think this should return a Status on failure. -- To view, visit http://gerrit.cloudera.org:8080/7442 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I109635a54268b9db741b2ae9ea3e9f1fe072d0a8 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
