[
https://issues.apache.org/jira/browse/OAK-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018616#comment-14018616
]
Thomas Mueller edited comment on OAK-1849 at 7/7/14 9:34 AM:
-------------------------------------------------------------
What you describe above is the solution we had for Jackrabbit 2.x data stores,
to share data stores. For the FileDataStore, we used the lastModified field of
the file. For large data stores, updating the field takes quite a long time, as
the metadata of each file needs to be changed. In the past, this turned out to
be a performance problem.
To speed up garbage collection, I suggest we use a slightly different mechanism
(unless for cases where we share a datastore with a Jackrabbit 2.x repository):
# We use {{collectGarbage(boolean markOnly)}} - same as what you described
above. If the flag is {{true}}, the list of used blob ids are written to a flat
file in the root directory of the data store (using a random file name) during
or at the end of the {{mark}} phase.
# If {{markOnly}} is {{false}}, the {{sweep()}} method needs to additionally
check the root directory of the data store, and process all flat files stored
there, combining the lists if there are multiple. Entries in the list(s) must
not be deleted. At the end of the sweep phase, the processed files may be
removed.
was (Author: tmueller):
What you describe above is the solution we had for Jackrabbit 2.x data stores,
to share data stores. For the FileDataStore, we used the lastModified field of
the file. For large data stores, updating the field takes quite a long time, as
the metadata of each file needs to be changed. In the past, this turned out to
be a performance problem.
To speed up garbage collection, I suggest we use a slightly different mechanism
(unless for cases where we share a datastore with a Jackrabbit 2.x repository):
# We use {{collectGarbage(boolean markOnly)}} - same as what you described
above. If the flag is {{true}}, the list of used blob ids are written to a flat
file in the root directory of the data store (using a random file name) during
or at the end of the {{mark}} phase.
# If {{markOnly}} if {{false}}, the {{sweep()}} method needs to additionally
check the root directory of the data store, and process all flat files stored
there, combining the lists if there are multiple. Entries in the list(s) must
not be deleted. At the end of the sweep phase, the processed files may be
removed.
> DataStore GC support for heterogeneous deployments using a shared datastore
> ---------------------------------------------------------------------------
>
> Key: OAK-1849
> URL: https://issues.apache.org/jira/browse/OAK-1849
> Project: Jackrabbit Oak
> Issue Type: Bug
> Reporter: Amit Jain
>
> If the deployment is such that there are 2 or more different instances with a
> shared datastore, triggering Datastore GC from one instance will result in
> blobs used by another instance getting deleted, causing data loss.
--
This message was sent by Atlassian JIRA
(v6.2#6252)