Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11249 )
Change subject: KUDU-2469 pt 2: fail replicas on CFile corruption ...................................................................... Patch Set 9: (5 comments) I did a quick first pass. The theme of my feedback is that handling the corruption at the boundaries of the public api instead of each location of internal api should reduce the location we need to add handling code while still providing the same coverage. As long as the lower level calls return Status:Corruption the higher level handling can handle and fail the tablet. http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc File src/kudu/cfile/bloomfile.cc: http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc@257 PS9, Line 257: return Status::Corruption("bloom file is compressed (compression not supported)", I suppose the type of corruption is interesting. It's clear the a Cfile with a checksum issue is corrupt and should be failed. How could some of these other corruptions happen? Missing index, compression unsupported, etc. All of these should result in checksum errors when initializing in theory. Is this catch-all safe? http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/bloomfile.cc@344 PS9, Line 344: RETURN_NOT_OK(ParseBlockHeader(io_context, dblk_data.data(), &hdr, &bloom_data)); Can we rely on the the above reader_->ReadBlock to handle any corruption issues instead of handling all cases in the bloomfile code too? http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc@185 PS9, Line 185: RETURN_NOT_OK(ReadAndParseFooter(io_context)); Can we handle header and footer corruption here with RETURN_NOT_OK_HANDLE_CORRUPTION instead of at each location in the internal methods? http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/cfile/cfile_reader.cc@328 PS9, Line 328: RETURN_NOT_OK(VerifyChecksum(io_context, slices, checksum)); Can we handle the checksum corruption here with RETURN_NOT_OK_HANDLE_CORRUPTION instead of at each location in the internal method? http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc File src/kudu/tablet/deltafile.cc: http://gerrit.cloudera.org:8080/#/c/11249/9/src/kudu/tablet/deltafile.cc@274 PS9, Line 274: RETURN_NOT_OK(reader_->Init(io_context)); Same comment in this file as the ones in the bloomfile code. Can we depend on CfileReader correctly handling any corruption so that we don't need to handle it in all the wrappers? -- To view, visit http://gerrit.cloudera.org:8080/11249 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I63d541443bc68c83fd0ca6d51315143fee04d50f Gerrit-Change-Number: 11249 Gerrit-PatchSet: 9 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Thu, 30 Aug 2018 17:13:31 +0000 Gerrit-HasComments: Yes
