Erwin Tam updated ZOOKEEPER-464:
Revised patch for the entry log garbage collecting feature. Description on a
high level for how this works is included here. Only the major files modified
will be listed.
1. Client changes:
Added BK client methods to delete a ledger, both synchronously and
asynchronously. Deleting a ledger from the client side is to just remove the
ZK ledger metadata node for it (in the /ledgers/L<ledger id> path).
2. Bookie Server changes:
BK servers now are also ZK clients so they can query ZK for the existing ledger
nodes to see what the current active set of ledgers are. Bookies when
initialized will create the ZK client instance and register an ephemeral node
in ZK for the server.
EntryLogger when initialized will try to extract all of the ledgers that make
up the already existing entry logs (if any). We do not extract ledgers from the
current active entry log that we are writing into since this will never be
deleted. When entry logs roll over, that's when we will read through old entry
logs and extract the set of ledgers that make up all of the entries in the
entry log (if it hasn't been done already). This data is stored in memory as a
mapping from entry log ID's to the set of ledger ID's that comprise them.
LedgerCache when initialized will read through all of the existing ledger index
files (if any) and store them in memory as the set of active ledgers that the
Bookie Server knows about. When a new ledger index file is created (new
ledger), we will add that to this in memory set mapping.
The EntryLogger contains the Garbage Collector thread which runs periodically.
This first syncs with ZK, then reads the set of current active ledgers. It
compares this to the Bookie Server's LedgerCache's set of active ledgers that
the Bookie Server knows about. If there are any in that set but not in ZK,
these are removed. Then we loop though all of the older entry logs (other than
the one we're writing into), see which ledgers make the entry log. If any of
those ledgers are deleted, we remove it from the entry log's ledgers set. If
any of these entry logs have no more active ledgers associated with them, then
we delete the entry log.
3. Unit tests:
New unit test to test out entry logs being garbage collected. This test will
write enough ledger entries to roll over the first entry log. Then the ledgers
are deleted from the client side and the BK server's garbage collecting thread
will pick up these changes and delete the entry log file(s). Two variations of
this test for the initial case when the BK servers start up with no existing
entry logs and a second where entry logs already exist.
> Need procedure to garbage collect ledgers
> Key: ZOOKEEPER-464
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-464
> Project: Zookeeper
> Issue Type: New Feature
> Components: contrib-bookkeeper
> Reporter: Flavio Paiva Junqueira
> Assignee: Erwin Tam
> Fix For: 3.4.0
> Attachments: ZOOKEEPER-464.patch
> An application using BookKeeper is likely to use a large number of ledgers
> over time. Such an application might not need all ledgers created over time
> and might want to delete some of these ledgers to free up some space on
> bookies. The idea of this jira is to implement a procedure that enables an
> application to garbage-collect unwanted ledgers.
> To garbage-collect a ledger, we need to delete the ledger metadata on
> ZooKeeper, and delete the ledger data on corresponding bookies.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.