Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12308 )

Change subject: KUDU2665: LBM deletes the valid live block
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/12308/1//COMMIT_MSG
Commit Message:

PS1:
I'd like to see a targeted test for this. I think you can write a single 
threaded test in log_block_manager-test that does something like this:

1. Create a container.
2. Write a block to it.
3. In a loop, create, write, and finalize enough WritableBlocks in the 
container so that it becomes full. Maybe adjust log_container_max_size so you 
don't have to write too much, or adjust log_container_max_blocks instead.
4. When it's full, delete the block written in step #2. Now the container is 
full and has no live blocks. As per the old code, it should be marked as dead 
and removed from the LBM.
5. Close the WritableBlocks finalized in step 3. This should drop all 
referrents and cause the container to be erroneously deleted.
6. You can test whether the container got deleted, and/or you should be able to 
open and read the blocks created in step 3.

Such a test should fail without this fix and pass with it.


http://gerrit.cloudera.org:8080/#/c/12308/1//COMMIT_MSG@7
PS1, Line 7: KUDU2665: LBM deletes the valid live block
Nit: rewrite as "KUDU-2665: LBM may delete containers with live blocks"


http://gerrit.cloudera.org:8080/#/c/12308/1/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12308/1/src/kudu/fs/log_block_manager.cc@531
PS1, Line 531:   // Increases the referenced number.
             :   // Positive means increase, negative means decrease.
             :   int32_t IncrWritingRef(int32_t value) { return 
writing_ref_.IncrementBy(value); }
Nit: rename to "blocks_being_written_incr()" and change comment to "// Adjusts 
the number of blocks being written. Positive means increase, negative means 
decrease".


http://gerrit.cloudera.org:8080/#/c/12308/1/src/kudu/fs/log_block_manager.cc@535
PS1, Line 535:   // Returns the referenced number by LogWritableBlock.
             :   int32_t WritingRefCount() const { return writing_ref_.Load(); }
Nit: move this up to the "simple accessors" section and rename to 
blocks_being_written(). You can also drop the comment.


http://gerrit.cloudera.org:8080/#/c/12308/1/src/kudu/fs/log_block_manager.cc@640
PS1, Line 640:   // The number of references by LogWritableBlock.
             :   AtomicInt<int32_t> writing_ref_;
Nit: rename to blocks_being_written_ and change comment to "The number of 
LogWritableBlocks currently open for this container."


http://gerrit.cloudera.org:8080/#/c/12308/1/src/kudu/fs/log_block_manager.cc@1305
PS1, Line 1305:   // Although this code looks like a TOCTOU violation, it is 
safe because of
              :   // some additional LBM invariants:
              :   // 1. When a container becomes full, it stays full for the 
process' lifetime.
              :   // 2. A full container will never accrue another live block. 
Meaning, losing
              :   //    its last live block is a terminal state for a full 
container.
This needs to be updated to reflect the role of WritingRefCount(). Maybe this?

  // Although this code looks like a TOCTOU violation, it is safe because of
  // some additional LBM invariants:
  // 1. When a container becomes full, it stays full for the process' lifetime.
  // 2. A full container will never accrue another live block. Meaning, losing
  //    its last live block is a terminal state for a full container.
  // 3. The only exception to #2 is if the container currently has a finalized 
but
        not-yet-closed WritableBlock. In this case the container became full
        when the WritableBlock was finalized, but the live block counter only
        reflects the new block when it is closed.


http://gerrit.cloudera.org:8080/#/c/12308/1/src/kudu/fs/log_block_manager.cc@1421
PS1, Line 1421:     // However, due to KUDU-2665, it's not safe to visit 
container's next block
              :     // offset and live block count at the same time. Thus, we 
need another flag
              :     // to identify whether the container is being written 
instead of a lock.
Nit: this is more or less a retread of the "invariant" comment of TrySetDead. 
If you refactor this condition into a container method, you can move that 
comment into the method and drop this one.


http://gerrit.cloudera.org:8080/#/c/12308/1/src/kudu/fs/log_block_manager.cc@1424
PS1, Line 1424:     if (container->full() && container->live_blocks() == 0 && 
container->WritingRefCount() == 0) {
This is getting unwieldy to have two copies of; please decompose into a single 
container method.



--
To view, visit http://gerrit.cloudera.org:8080/12308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I894f32b1c164ae7770c92171850edd167dfaf8ad
Gerrit-Change-Number: 12308
Gerrit-PatchSet: 1
Gerrit-Owner: helifu <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 30 Jan 2019 20:22:40 +0000
Gerrit-HasComments: Yes

Reply via email to