[ https://issues.apache.org/jira/browse/ZOOKEEPER-464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Erwin Tam updated ZOOKEEPER-464: -------------------------------- Attachment: ZOOKEEPER-464.patch Revised patch for the entry log garbage collecting feature. Description on a high level for how this works is included here. Only the major files modified will be listed. 1. Client changes: BookKeeper.java LedgerDeleteOp.java AsyncCallback.java Added BK client methods to delete a ledger, both synchronously and asynchronously. Deleting a ledger from the client side is to just remove the ZK ledger metadata node for it (in the /ledgers/L<ledger id> path). 2. Bookie Server changes: EntryLogger.java LedgerCache.java Bookie.java BK servers now are also ZK clients so they can query ZK for the existing ledger nodes to see what the current active set of ledgers are. Bookies when initialized will create the ZK client instance and register an ephemeral node in ZK for the server. EntryLogger when initialized will try to extract all of the ledgers that make up the already existing entry logs (if any). We do not extract ledgers from the current active entry log that we are writing into since this will never be deleted. When entry logs roll over, that's when we will read through old entry logs and extract the set of ledgers that make up all of the entries in the entry log (if it hasn't been done already). This data is stored in memory as a mapping from entry log ID's to the set of ledger ID's that comprise them. LedgerCache when initialized will read through all of the existing ledger index files (if any) and store them in memory as the set of active ledgers that the Bookie Server knows about. When a new ledger index file is created (new ledger), we will add that to this in memory set mapping. The EntryLogger contains the Garbage Collector thread which runs periodically. This first syncs with ZK, then reads the set of current active ledgers. It compares this to the Bookie Server's LedgerCache's set of active ledgers that the Bookie Server knows about. If there are any in that set but not in ZK, these are removed. Then we loop though all of the older entry logs (other than the one we're writing into), see which ledgers make the entry log. If any of those ledgers are deleted, we remove it from the entry log's ledgers set. If any of these entry logs have no more active ledgers associated with them, then we delete the entry log. 3. Unit tests: LedgerDeleteTest.java New unit test to test out entry logs being garbage collected. This test will write enough ledger entries to roll over the first entry log. Then the ledgers are deleted from the client side and the BK server's garbage collecting thread will pick up these changes and delete the entry log file(s). Two variations of this test for the initial case when the BK servers start up with no existing entry logs and a second where entry logs already exist. > Need procedure to garbage collect ledgers > ----------------------------------------- > > Key: ZOOKEEPER-464 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-464 > Project: Zookeeper > Issue Type: New Feature > Components: contrib-bookkeeper > Reporter: Flavio Paiva Junqueira > Assignee: Erwin Tam > Fix For: 3.4.0 > > Attachments: ZOOKEEPER-464.patch > > > An application using BookKeeper is likely to use a large number of ledgers > over time. Such an application might not need all ledgers created over time > and might want to delete some of these ledgers to free up some space on > bookies. The idea of this jira is to implement a procedure that enables an > application to garbage-collect unwanted ledgers. > To garbage-collect a ledger, we need to delete the ledger metadata on > ZooKeeper, and delete the ledger data on corresponding bookies. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.