Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )
Change subject: WIP KUDU-2469 pt 2: fail replicas on CFile corruption ...................................................................... Patch Set 5: (6 comments) Still a WIP since I'd like to maybe add more docs to IOContext and how it should be used and add more tests, but just checking this in. http://gerrit.cloudera.org:8080/#/c/11249/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11249/2//COMMIT_MSG@18 PS2, Line 18: : : : : > I read through the comment in error_manager.h but still don't understand wh I tried clarifying this in a separate patch: https://gerrit.cloudera.org/c/11303/ http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/cfile/cfile_reader.cc@81 PS2, Line 81: DEFINE_double(cfile_inject_corruption, 0, > One of the reasons I didn't do this while writing the checksum code was du Ah nice, thanks for the pointer. Seems pretty useful for unit testing, though I think I'll still add this since I find it a bit easier to use in more integration-y tests. I agree it's kind of clunky in terms of the number of locations, but IMO it also makes it easier to get broader coverage of errors. http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/cfile/cfile_reader.cc@227 PS2, Line 227: > Should we use a macro for this pattern? Something similar to the HANDLE_DIS Done http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/cfile/cfile_reader.cc@348 PS2, Line 348: if (PREDICT_FALSE(checksum_value != expected_checksum)) { > Can we assert that this exists? Done http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/fs/error_manager.h File src/kudu/fs/error_manager.h: http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/fs/error_manager.h@76 PS2, Line 76: return _s; \ > Do we need a new handler type? Or is TABLET good enough to re-use? I tried clarifying this in this patch: https://gerrit.cloudera.org/c/11303/ Basically, the error handling should be based on the type of error, not the error handling callback. Sure, we want to fail a single tablet, but the TABLET handler is being used elsewhere, and changing the callback would have effects elsewhere. http://gerrit.cloudera.org:8080/#/c/11249/2/src/kudu/fs/error_manager.h@103 PS2, Line 103: > safety Ack -- To view, visit http://gerrit.cloudera.org:8080/11249 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f Gerrit-Change-Number: 11249 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Comment-Date: Wed, 29 Aug 2018 03:18:07 +0000 Gerrit-HasComments: Yes