[
https://issues.apache.org/jira/browse/OAK-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gardner Buchanan updated OAK-3140:
----------------------------------
Comment: was deleted
(was: I would advocate an approach based on the repository path eg: compute
the MD5 of the path, place the blob accordingly. Merely placing index files
within the datastore according to their path rather than their content would
immediately alleviate the bloat problem with indexes simply because the file
contents could be overwritten in place. It might not even be necessary to do
anything fancy about garbage collecting these.
GC, when it is needed, can take the same pattern as with the content based
approach -- traverse the repo, make a list of the paths and their MD5 sums --
traverse the blobstore and keep the items on the list.
I would also like to see the choice of blob store implementation made at the
repository level, maybe via a mixin or heritable property. Some other
application level functionality could benefit from cleanup in the same way as
index binaries, such as workflow payloads and replication durbo files. The
approach used for indexes should generalize to these other use-cases.)
> DataStore / BlobStore: add a method to pass a "type" when writing
> -----------------------------------------------------------------
>
> Key: OAK-3140
> URL: https://issues.apache.org/jira/browse/OAK-3140
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: blob
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
> Labels: performance
>
> Currently, the BlobStore interface has a method "String writeBlob(InputStream
> in)". This issue is about adding a new method "String writeBlob(String type,
> InputStream in)", for the following reasons (in no particular order):
> * Store some binaries (for example Lucene index files) in a different place,
> in order to safely and quickly run garbage collection just on those files.
> * Store some binaries in a slow, some in a fast storage or location.
> * Disable calculating the content hash (de-duplication) for some binaries.
> * Store some binaries in a shared storage (for fast cross-repository
> copying), and some in local storage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)