Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )
Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption ...................................................................... Patch Set 9: (10 comments) http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc File src/kudu/cfile/bloomfile.cc: http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc@257 PS9, Line 257: return Status::Corruption("bloom file is compressed (compression not supported)", > I suppose the type of corruption is interesting. I'm also wondering why compression or a missing value index are grounds for corruption. They seem more like NotSupported to me (i.e backwards incompatible on-disk state that is being loaded by an older version of Kudu). Does it really make sense to fail these replicas? What happens if reader_->HandleCorruption() isn't called; does the server crash? http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc@234 PS9, Line 234: .CloneAndPrepend("failed to parse CFile pre-header"); Shouldn't the CloneAndPrepend be conditioned on !s.OK()? http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/fs/io_context.h File src/kudu/fs/io_context.h: http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/fs/io_context.h@30 PS9, Line 30: user s/user/module/ based on the text in the examples. http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc File src/kudu/integration-tests/disk_failure-itest.cc: http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc@217 PS9, Line 217: ::testing::Values(ErrorType::CFILE_CORRUPTION)); Shouldn't this parameterize on DISK_FAILURE too? http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc@240 PS9, Line 240: // Wait a bit for some errors to occur. Can we ASSERT_EVENTUALLY() on something here, to ensure that we only run the ClusterVerifier when an error has actually occurred? Is there a concern that rereplication may kick in and finish too quickly for us to observe an error in the test? Maybe we can look at a metric? http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc@250 PS9, Line 250: // Set up a workload that does both reads and writes. But you're setting the number of write threads to 0... http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc@272 PS9, Line 272: // Wait a bit for some errors to occur. Same feedback about waiting for an error here. http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc File src/kudu/tablet/deltafile.cc: http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc@278 PS9, Line 278: return Status::Corruption("file does not have a value index!"); Again, seems more like NotSupported than Corruption. http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc@291 PS9, Line 291: return Status::Corruption("missing delta stats from the delta file metadata"); Same here. http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tserver/tablet_server-test.cc@1676 PS9, Line 1676: auto log_req = MakeScopedCleanup([&] { : LOG(INFO) << SecureDebugString(req); : }); Does SCOPED_TRACE not work in here? -- To view, visit http://gerrit.cloudera.org:8080/11249 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f Gerrit-Change-Number: 11249 Gerrit-PatchSet: 9 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Thu, 30 Aug 2018 17:58:18 +0000 Gerrit-HasComments: Yes
