[ 
https://issues.apache.org/jira/browse/OAK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898927#comment-13898927
 ] 

Thomas Mueller commented on OAK-377:
------------------------------------

Thanks a lot for the patch! It looks good, but I do have a few comments:

There is a "lock" file, and the algorithm checks whether GC is already running 
by checking the existence of the file. What if the process is killed while GC 
is running, wouldn't the file still exist, and further GCs be blocked? Why do 
we need to verify if GC is already running? What would happen if it did run 
multiple times concurrently? Within what scope can we verify it (within the 
same process, within the same machine, for the given blob store)?

Dependency to com.google.code.externalsortinginjava: it is public domain, so I 
guess there is no license problem. Still I wonder if we should just copy that 
class. We should still acknowledge we use this package, as described in the 
license.

About the field "seed" which indicates the starting time. I would rename it to 
"startTime". To me, seed sounds like a seed value for a random number generator 
:-)

About GarbageCollectorFileState: the root directory seems to be the current 
working directory ("."). I would avoid that, and use the temp directory 
instead. I would create all the files with File.createTempFile, and use 
File.deleteOnExit() in addition to deleting the files manually when GC ends / 
stops. I know deleteOnExit() is a bit problematic, but as we don't run garbage 
collection that often, so it should be OK.

Threads: I would give each thread a name, and set the daemon flag, so the 
process can exit even if GC is still running (I assume that's what we want to 
do: stop GC when the application ends). 

RDBBlobStore: I think you have reformatted a line unnecessarily, the one with 
"update datastore_meta set lastMod = ?".



> Data store garbage collection
> -----------------------------
>
>                 Key: OAK-377
>                 URL: https://issues.apache.org/jira/browse/OAK-377
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core, mk
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>             Fix For: 0.17
>
>         Attachments: OAK-377.patch
>
>
> Unused binaries in the data store need to be garbage collected.
> There is a partial implementation in oak-mk, however it is currently not run 
> (not run automatically, and I think there is no way to run it manually).
> Also, we might want to investigate in faster garbage collection algorithms: 
> young generation garbage collection, or garbage collection using reference 
> counting (for example using an index of references to the data store).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to