Erwin Tam updated ZOOKEEPER-464:

    Attachment: ZOOKEEPER-464.patch

Revised patch for the entry log garbage collecting feature.  Description on a 
high level for how this works is included here. Only the major files modified 
will be listed.

1. Client changes:

Added BK client methods to delete a ledger, both synchronously and 
asynchronously.  Deleting a ledger from the client side is to just remove the 
ZK ledger metadata node for it (in the /ledgers/L<ledger id> path).

2. Bookie Server changes:

BK servers now are also ZK clients so they can query ZK for the existing ledger 
nodes to see what the current active set of ledgers are.  Bookies when 
initialized will create the ZK client instance and register an ephemeral node 
in ZK for the server.  

EntryLogger when initialized will try to extract all of the ledgers that make 
up the already existing entry logs (if any). We do not extract ledgers from the 
current active entry log that we are writing into since this will never be 
deleted.  When entry logs roll over, that's when we will read through old entry 
logs and extract the set of ledgers that make up all of the entries in the 
entry log (if it hasn't been done already).  This data is stored in memory as a 
mapping from entry log ID's to the set of ledger ID's that comprise them.  

LedgerCache when initialized will read through all of the existing ledger index 
files (if any) and store them in memory as the set of active ledgers that the 
Bookie Server knows about.  When a new ledger index file is created (new 
ledger), we will add that to this in memory set mapping.

The EntryLogger contains the Garbage Collector thread which runs periodically.  
This first syncs with ZK, then reads the set of current active ledgers.  It 
compares this to the Bookie Server's LedgerCache's set of active ledgers that 
the Bookie Server knows about.  If there are any in that set but not in ZK, 
these are removed.  Then we loop though all of the older entry logs (other than 
the one we're writing into), see which ledgers make the entry log.  If any of 
those ledgers are deleted, we remove it from the entry log's ledgers set.  If 
any of these entry logs have no more active ledgers associated with them, then 
we delete the entry log.

3. Unit tests:
New unit test to test out entry logs being garbage collected. This test will 
write enough ledger entries to roll over the first entry log.  Then the ledgers 
are deleted from the client side and the BK server's garbage collecting thread 
will pick up these changes and delete the entry log file(s).  Two variations of 
this test for the initial case when the BK servers start up with no existing 
entry logs and a second where entry logs already exist.

> Need procedure to garbage collect ledgers
> -----------------------------------------
>                 Key: ZOOKEEPER-464
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-464
>             Project: Zookeeper
>          Issue Type: New Feature
>          Components: contrib-bookkeeper
>            Reporter: Flavio Paiva Junqueira
>            Assignee: Erwin Tam
>             Fix For: 3.4.0
>         Attachments: ZOOKEEPER-464.patch
> An application using BookKeeper is likely to use a large number of ledgers 
> over time. Such an application might not need all ledgers created over time 
> and might want to delete some of these ledgers to free up some space on 
> bookies. The idea of this jira is to implement a procedure that enables an 
> application to garbage-collect unwanted ledgers.
> To garbage-collect a ledger, we need to delete the ledger metadata on 
> ZooKeeper, and delete the ledger data on corresponding bookies. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to