[ 
https://issues.apache.org/jira/browse/OAK-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gardner Buchanan updated OAK-3140:
----------------------------------
    Comment: was deleted

(was: I would advocate an approach based on the repository path  eg: compute 
the MD5 of the path, place the blob accordingly.  Merely placing index files 
within the datastore according to their path rather than their content would 
immediately alleviate the bloat problem with indexes simply because the file 
contents could be overwritten in place.  It might not even be necessary to do 
anything fancy about garbage collecting these.

GC, when it is needed, can take the same pattern as with the content based 
approach -- traverse the repo, make a list of the paths and their MD5 sums -- 
traverse the blobstore and keep the items on the list.

I would also like to see the choice of blob store implementation made at the 
repository level, maybe via a mixin or heritable property.  Some other 
application level functionality could benefit from cleanup in the same way as 
index binaries, such as workflow payloads and replication durbo files.  The 
approach used for indexes should generalize to these other use-cases.)

> DataStore / BlobStore: add a method to pass a "type" when writing
> -----------------------------------------------------------------
>
>                 Key: OAK-3140
>                 URL: https://issues.apache.org/jira/browse/OAK-3140
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: blob
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>              Labels: performance
>
> Currently, the BlobStore interface has a method "String writeBlob(InputStream 
> in)". This issue is about adding a new method "String writeBlob(String type, 
> InputStream in)", for the following reasons (in no particular order):
> * Store some binaries (for example Lucene index files) in a different place, 
> in order to safely and quickly run garbage collection just on those files.
> * Store some binaries in a slow, some in a fast storage or location.
> * Disable calculating the content hash (de-duplication) for some binaries.
> * Store some binaries in a shared storage (for fast cross-repository 
> copying), and some in local storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to