Adar Dembo has posted comments on this change. Change subject: [WIP]log block manager: mark container as 'read_only' after syncing error ......................................................................
Patch Set 12: (20 comments) http://gerrit.cloudera.org:8080/#/c/7374/12//COMMIT_MSG Commit Message: PS12, Line 7: [WIP]log block manager: mark container as 'read_only' after : syncing error Since this is almost ready to go, remove the WIP prefix. Also, you should write either read-only or 'read only'. The former refers to the standard software definition of a noun that may only be read, not written, and the latter asks the reader to conjure up a more abstract notion of read only-ness (that's the effect of the single quotes) and apply it to a container. I think either is readable, but 'read_only' is neither and just serves to confuse. PS12, Line 15: For instance, when Close() block A in container a, SyncData() : is not successful, but metadata of block A is already on : disk. For merge compaction, the server does not crash for non-eio : error. Based on the existing behavior, the container will be : continued used, result in block A point to others' block data. Rewrite: For instance, suppose that SyncData() fails during Close() of block A. Even though Close() itself will fail, it's possible that block A's metadata was successfully written out to disk. Some Kudu operations (e.g. merge compaction) will not crash during a block error like this one, and if block A's container is reused when writing block B, block A's metadata may erroneously point at block B. PS12, Line 21: 'read_only' Nit: fix 'read_only' here too. PS12, Line 22: metdata metadata http://gerrit.cloudera.org:8080/#/c/7374/12/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: Line 843: // Regression test for KUDU-1793. This test simulates the case But we're no longer testing failures in Append(); that seems like important missing coverage. PS12, Line 844: metdata metadata Line 864: // Creates a block, writing the result to 'out' on success. Nit: update to reflect the effect of 'test_data'. Line 875: google::FlagSaver saver; Hmm, I wonder if using a FlagSaver on every block creation is expensive. Could you measure this test's running time before and after your change? Line 884: // Reads a block given by 'id'. We've also lost test coverage here by not actually verifying the result of Read(). PS12, Line 909: if (i % 2 == 0) { : s = create_a_block(&id, kLongTestData); : } else { : s = create_a_block(&id, kShortTestData); : } Nit: use a ternary: s = create_a_block(&id, i % 2 ? kShortTestData : kLongTestData); http://gerrit.cloudera.org:8080/#/c/7374/8/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1206: return Status::OK(); > oh I see, that's very subtle. Could you add a comment to that effect above +1 to Dan's suggestion. http://gerrit.cloudera.org:8080/#/c/7374/12/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS12, Line 378: read only Nit: I see read only and 'read_only' in the various comments you've added to this file; could you change them all to either read-only or 'read only' as per the suggestion I left in the commit description? PS12, Line 474: That means blocks : // cannot be deleted until next restart. And the container won't receive new blocks, right? PS12, Line 771: and Nit: don't need 'and'. PS12, Line 780: makes marks PS12, Line 788: // On the other hand, it is not certain that the syncing failure indicates : // the new blocks' metadata are not actually on disk. The double negative in this sentence makes it tougher to read, please rewrite. PS12, Line 1156: read_only read-only or read only Also, don't capitalize the first word; see my comment to Will about that here: https://gerrit.cloudera.org/#/c/7444/6/src/kudu/tools/tool_action_tablet.cc@149 PS12, Line 1645: should be aborted "is forbidden" Line 1656: return Status::Aborted("Container $0 is read_only", lb->container()->ToString()); See my other comment on how to phrase this. Also, I think IllegalState is more appropriate here, as nothing is being aborted. Line 1660: lb = RemoveLogBlock(block_id); Why acquire lock_ and do the map lookup twice? Please rewrite this so there's just one lock acquisition and one map lookup. The functions in gutil/map-util.h should help. -- To view, visit http://gerrit.cloudera.org:8080/7374 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I15062b9d82c9cb563fb6bb2af7ec89da5f71e28f Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao Hao <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
