[
https://issues.apache.org/jira/browse/OAK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898936#comment-13898936
]
Amit Jain commented on OAK-377:
-------------------------------
>>There is a "lock" file, and the algorithm checks whether GC is already
>>running by checking the existence of the file. What if the process is killed
>>while >>GC is running, wouldn't the file still exist, and further GCs be
>>blocked? Why do we need to verify if GC is already running? What would happen
>>if it did >>run multiple times concurrently? Within what scope can we verify
>>it (within the same process, within the same machine, for the given blob
>>store)?
Yes, GC would be blocked if the process is killed.
I think there is a chance of some deletions of blobs recognized as garbage,
failing (because it may have been deleted by a process concurrently), though
this does not look fatal, if the deletion exceptions are properly handled. This
can be verified in the scope of the blob store being used.
The other reason I kept the lock file concept was to not unnecessarily burden
the system by running multiple gc cycles concurrently.
We might be able to mitigate some of the problems associated with the lock file
existence by using flagging deleteOnExit() as you suggest
>> About GarbageCollectorFileState: the root directory seems to be the current
>> working directory (".").
The root used is as passed which if not available from the caller, is set to
system temp file directory. I think the problem is the unit test which is
passing ".". I will change that.
Will change rest of the things as suggested.
> Data store garbage collection
> -----------------------------
>
> Key: OAK-377
> URL: https://issues.apache.org/jira/browse/OAK-377
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: core, mk
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
> Fix For: 0.17
>
> Attachments: OAK-377.patch
>
>
> Unused binaries in the data store need to be garbage collected.
> There is a partial implementation in oak-mk, however it is currently not run
> (not run automatically, and I think there is no way to run it manually).
> Also, we might want to investigate in faster garbage collection algorithms:
> young generation garbage collection, or garbage collection using reference
> counting (for example using an index of references to the data store).
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)