Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11249 )

Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption
......................................................................


Patch Set 9:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc
File src/kudu/cfile/bloomfile.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc@257
PS9, Line 257:     return Status::Corruption("bloom file is compressed 
(compression not supported)",
> I'm also wondering why compression or a missing value index are grounds for
Yeah, I agree with returning some other error, or just not handling the 
corruption since it's not indicative of a disk-related corruption. It might 
crash, depending on the operation (e.g. if this is done in the context of 
Flush, it will, if on a scan, compaction, it won't).

Regardless, failing the tablet seems inappropriate here.


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc@185
PS9, Line 185:   RETURN_NOT_OK(ReadAndParseFooter(io_context));
> I think if we eliminate all the Corruption status that should actually be s
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc@234
PS9, Line 234:       .CloneAndPrepend("failed to parse CFile pre-header");
> Shouldn't the CloneAndPrepend be conditioned on !s.OK()?
It could be, though CloneAndPrepend does that check already.


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc@328
PS9, Line 328:     RETURN_NOT_OK(VerifyChecksum(io_context, slices, checksum));
> Can we handle the checksum corruption here with RETURN_NOT_OK_HANDLE_CORRUP
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/fs/io_context.h
File src/kudu/fs/io_context.h:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/fs/io_context.h@30
PS9, Line 30: user
> s/user/module/ based on the text in the examples.
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc
File src/kudu/integration-tests/disk_failure-itest.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc@217
PS9, Line 217:     ::testing::Values(ErrorType::CFILE_CORRUPTION));
> Shouldn't this parameterize on DISK_FAILURE too?
Whoops yes


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc@240
PS9, Line 240:   // Wait a bit for some errors to occur.
> Can we ASSERT_EVENTUALLY() on something here, to ensure that we only run th
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc@250
PS9, Line 250:   // Set up a workload that does both reads and writes.
> But you're setting the number of write threads to 0...
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/integration-tests/disk_failure-itest.cc@272
PS9, Line 272:   // Wait a bit for some errors to occur.
> Same feedback about waiting for an error here.
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc
File src/kudu/tablet/deltafile.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc@274
PS9, Line 274:   RETURN_NOT_OK(reader_->Init(io_context));
> Same comment in this file as the ones in the bloomfile code. Can we depend
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc@278
PS9, Line 278:     return Status::Corruption("file does not have a value 
index!");
> Again, seems more like NotSupported than Corruption.
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc@291
PS9, Line 291:     return Status::Corruption("missing delta stats from the 
delta file metadata");
> Same here.
Done


http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tserver/tablet_server-test.cc@1676
PS9, Line 1676:     auto log_req = MakeScopedCleanup([&] {
              :       LOG(INFO) << SecureDebugString(req);
              :     });
> Does SCOPED_TRACE not work in here?
Indeed, seems ASSERT_EVENTUALLY and SCOPED_TRACE don't like each other. It 
would break the build as:

 ../../src/kudu/tserver/tablet_server-test.cc:1674:77: note: 
‘testing::internal::ScopedTrace gtest_trace_1683’ previously declared here

I'll just tweak this a bit, since this is kind of ugly.



--
To view, visit http://gerrit.cloudera.org:8080/11249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f
Gerrit-Change-Number: 11249
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 01 Sep 2018 07:11:29 +0000
Gerrit-HasComments: Yes

Reply via email to