[
https://issues.apache.org/jira/browse/OAK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890396#comment-13890396
]
Amit Jain commented on OAK-377:
-------------------------------
Have the implementation ready for an external mark and sweep blob garbage
collector using DocumentMK. The gc works by iterating all the nodes for
identifying the referenced blobs and then calculating the set difference with
blobs in the blob store and then deleting them. Uses the existing
BlobReferenceIterator for iterating the tree.
High level change log as follows:
* Interface added - BlobGarbageCollector
** public void garbageCollect(NodeStore nodeStore) throws Exception;
* GC Implementation - MarkSweepGarbageCollector
** public void garbageCollect(NodeStore nodeStore) throws Exception;
** protected void markAndSweep() throws Exception;
** protected void mark() throws Exception;
** protected void sweep() throws Exception;
* Helper Class - GarbageCollectorFileState
* Added the following methods to the GarbageCollectableBlobStore:
** Iterator<String> getAllChunkIds(long maxLastModifiedTime) throws Exception;
** boolean deleteChunk(String chunkId) throws Exception;
** Iterator<String> resolveChunks(String blobId) throws IOException;
* Added resolveChunks() implementation to AbstractBlobStore
* Added implementations for deleteChunk() and getAllChunkIds() to the following
BlobStore implementations:
** FileBlobStore
** MemoryBlobStore
** DbBlobStore
** MongoBlobStore
** RDBBlobStore
** CloudBlobStore - OAK-1157
** DataStoreBlobStore - OAK-1157
> Data store garbage collection
> -----------------------------
>
> Key: OAK-377
> URL: https://issues.apache.org/jira/browse/OAK-377
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: core, mk
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
> Fix For: 0.16
>
>
> Unused binaries in the data store need to be garbage collected.
> There is a partial implementation in oak-mk, however it is currently not run
> (not run automatically, and I think there is no way to run it manually).
> Also, we might want to investigate in faster garbage collection algorithms:
> young generation garbage collection, or garbage collection using reference
> counting (for example using an index of references to the data store).
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)