Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12308 )
Change subject: KUDU2665: LBM deletes the valid live block ...................................................................... Patch Set 1: (8 comments) http://gerrit.cloudera.org:8080/#/c/12308/1//COMMIT_MSG Commit Message: PS1: I'd like to see a targeted test for this. I think you can write a single threaded test in log_block_manager-test that does something like this: 1. Create a container. 2. Write a block to it. 3. In a loop, create, write, and finalize enough WritableBlocks in the container so that it becomes full. Maybe adjust log_container_max_size so you don't have to write too much, or adjust log_container_max_blocks instead. 4. When it's full, delete the block written in step #2. Now the container is full and has no live blocks. As per the old code, it should be marked as dead and removed from the LBM. 5. Close the WritableBlocks finalized in step 3. This should drop all referrents and cause the container to be erroneously deleted. 6. You can test whether the container got deleted, and/or you should be able to open and read the blocks created in step 3. Such a test should fail without this fix and pass with it. http://gerrit.cloudera.org:8080/#/c/12308/1//COMMIT_MSG@7 PS1, Line 7: KUDU2665: LBM deletes the valid live block Nit: rewrite as "KUDU-2665: LBM may delete containers with live blocks" http://gerrit.cloudera.org:8080/#/c/12308/1/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: http://gerrit.cloudera.org:8080/#/c/12308/1/src/kudu/fs/log_block_manager.cc@531 PS1, Line 531: // Increases the referenced number. : // Positive means increase, negative means decrease. : int32_t IncrWritingRef(int32_t value) { return writing_ref_.IncrementBy(value); } Nit: rename to "blocks_being_written_incr()" and change comment to "// Adjusts the number of blocks being written. Positive means increase, negative means decrease". http://gerrit.cloudera.org:8080/#/c/12308/1/src/kudu/fs/log_block_manager.cc@535 PS1, Line 535: // Returns the referenced number by LogWritableBlock. : int32_t WritingRefCount() const { return writing_ref_.Load(); } Nit: move this up to the "simple accessors" section and rename to blocks_being_written(). You can also drop the comment. http://gerrit.cloudera.org:8080/#/c/12308/1/src/kudu/fs/log_block_manager.cc@640 PS1, Line 640: // The number of references by LogWritableBlock. : AtomicInt<int32_t> writing_ref_; Nit: rename to blocks_being_written_ and change comment to "The number of LogWritableBlocks currently open for this container." http://gerrit.cloudera.org:8080/#/c/12308/1/src/kudu/fs/log_block_manager.cc@1305 PS1, Line 1305: // Although this code looks like a TOCTOU violation, it is safe because of : // some additional LBM invariants: : // 1. When a container becomes full, it stays full for the process' lifetime. : // 2. A full container will never accrue another live block. Meaning, losing : // its last live block is a terminal state for a full container. This needs to be updated to reflect the role of WritingRefCount(). Maybe this? // Although this code looks like a TOCTOU violation, it is safe because of // some additional LBM invariants: // 1. When a container becomes full, it stays full for the process' lifetime. // 2. A full container will never accrue another live block. Meaning, losing // its last live block is a terminal state for a full container. // 3. The only exception to #2 is if the container currently has a finalized but not-yet-closed WritableBlock. In this case the container became full when the WritableBlock was finalized, but the live block counter only reflects the new block when it is closed. http://gerrit.cloudera.org:8080/#/c/12308/1/src/kudu/fs/log_block_manager.cc@1421 PS1, Line 1421: // However, due to KUDU-2665, it's not safe to visit container's next block : // offset and live block count at the same time. Thus, we need another flag : // to identify whether the container is being written instead of a lock. Nit: this is more or less a retread of the "invariant" comment of TrySetDead. If you refactor this condition into a container method, you can move that comment into the method and drop this one. http://gerrit.cloudera.org:8080/#/c/12308/1/src/kudu/fs/log_block_manager.cc@1424 PS1, Line 1424: if (container->full() && container->live_blocks() == 0 && container->WritingRefCount() == 0) { This is getting unwieldy to have two copies of; please decompose into a single container method. -- To view, visit http://gerrit.cloudera.org:8080/12308 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I894f32b1c164ae7770c92171850edd167dfaf8ad Gerrit-Change-Number: 12308 Gerrit-PatchSet: 1 Gerrit-Owner: helifu <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Wed, 30 Jan 2019 20:22:40 +0000 Gerrit-HasComments: Yes
