Erwin Tam commented on ZOOKEEPER-464:

There was an intermittent problem with the ledger delete junit tests prior to 
the last patch uploaded (which resolved it).  I'll document the bug and the fix 
for that here.

The ledger delete junit tests were failing intermittently and it was related to 
an issue I saw earlier when I was running unit tests with a very small entry 
log limit size (2K).  When the entry logs roll over, we create a new one by 
first writing the "BKLO" 1024 byte header to the beginning of the file.  The 
problem is, this byte buffer object is statically defined.  In our junit tests, 
we have multiple Bookie servers (and thus EntryLogger instances) in the same 
jvm.  If more than one EntryLogger is rolling over its current log and writing 
the next one, they are accessing the same entryLog file header buffer.  This 
creates problems since the static header isn't accessed in a synchronized way.  

This header byte buffer is cleared first before writing it to the log file. 
Since it is static, one thread could clear it first, then another thread (from 
a second Bookie server) clears it at the same time. The first thread writes the 
header but when it is done, the header's byte buffer's internal pointers have 
it pointing to the end and aren't reset.  The second thread will then be 
reading the header buffer that has not been cleared/reset.  What ends up 
happening is the entry logs in the second Bookie are created without the 
header.  When we're reading through those files later on to figure out which 
ledgers make it up, it'll read incorrect values and try to allocate byte 
buffers based on an incorrect length segment (basically reading in junk random 
bytes).  This creates the java heap space error.

The fix is simple and is to just make this logfile header a non-static 
variable, initializing it in the EntryLogger constructor.  In practice, we 
shouldn't be running multiple Bookies within the same jvm so we wouldn't run 
into this problem.

> Need procedure to garbage collect ledgers
> -----------------------------------------
>                 Key: ZOOKEEPER-464
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-464
>             Project: Zookeeper
>          Issue Type: New Feature
>          Components: contrib-bookkeeper
>            Reporter: Flavio Paiva Junqueira
>            Assignee: Erwin Tam
>             Fix For: 3.4.0
>         Attachments: zookeeper-464-log.txt, ZOOKEEPER-464.patch, 
> ZOOKEEPER-464.patch, ZOOKEEPER-464.patch
> An application using BookKeeper is likely to use a large number of ledgers 
> over time. Such an application might not need all ledgers created over time 
> and might want to delete some of these ledgers to free up some space on 
> bookies. The idea of this jira is to implement a procedure that enables an 
> application to garbage-collect unwanted ledgers.
> To garbage-collect a ledger, we need to delete the ledger metadata on 
> ZooKeeper, and delete the ledger data on corresponding bookies. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to