Hi,

It is important to understand which operations are available in the JCR API, 
the DataStore API, and the concept of revisions we use for Oak. For example, 

* The DataStore API doesn’t support updating a binary.
* A node might have multiple revisions.
* In the Oak revision model, you can't update a reference of an old revision.
* The JCR API allows to create binaries without nodes via ValueFactory (so it's 
not possible to use storage filters at that time).

What you didn't address is how to read if there are multiple possible storage 
locations, so I assume you didn't think about that case. In my view, this 
should be supported. You might want to read up on LSM trees on how to do that: 
using bloom filters for example.

Suggested readings:
* https://docs.adobe.com/content/docs/en/spec/jsr170/javadocs/jcr-2.0/index.html
* https://docs.adobe.com/content/docs/en/spec/jcr/1.0/index.html
* https://en.wikipedia.org/wiki/Content-addressable_storage
* https://en.wikipedia.org/wiki/Log-structured_merge-tree

Regards,
Thomas



On 15.08.17, 08:00, "Thomas Mueller" <[email protected]> wrote:

    Hi,
    
    I read you wiki update, and this caught my eye:
    
    >  If a match is found, the write is treated as an update; if no match is 
found, the write is treated as a create.
    
    In the DataStore, there is no such thing as an update. There are only the 
following operations:
    
    * write
    * read
    * delete, via garbage collection
    
    See also https://en.wikipedia.org/wiki/Content-addressable_storage
    
    Regards,
    Thomas
    
    
    On 14.08.17, 17:17, "Matt Ryan" <[email protected]> wrote:
    
        Bump.  If anyone has feedback I’d love to hear it.
        
        
        On August 3, 2017 at 6:27:39 PM, Matt Ryan ([email protected]) wrote:
        
        Hi,
        
        I’ve been thinking the past few days about how a composite blob store 
might
        go about prioritizing the delegate blob stores for reading and writing,
        considering concepts like storage filters on a blob store, read-only 
blob
        stores, and archive or “cold” blob stores (which we don’t currently 
have,
        but could in the future).
        
        Storage filters basically restrict what can be stored in a delegate - 
like
        saying only blobs with a certain JCR property, etc.  (I realize there 
are
        implications with this too - I’ll worry about that in a separate thread
        someday.)
        
        I’d like feedback on the following idea:
        - Create a new public interface in Oak that can be injected into the
        composite blob store and used to handle the delegate prioritization for
        reads and writes.
        - Create a default implementation of this interface that can be used in
        most cases (see below).
        
        This would allow extensibility in this area to implement new or more 
custom
        algorithms for any future use cases, as needed, without tying it to
        configuration.
        
        The default implementation would be basically this:
        - For reads:
          - Delegates with storage filters first
          - Delegates without storage filters next
          - Read-only delegates next (with filters first, then without)
          - Retry reads on delegates with with filters that were previously 
skipped
        (this is a special case)
          - Cold storage delegates last
        
        - For writes:
          - Search for an existing blob first using the “read” algorithm - 
always
        update an existing blob, if one is found (except in cold storage)
          - If not found:
            - Try delegates with storage filters first
            - Delegates without storage filters next
        
        The special case to retry reads on delegates with filters that were
        previously skipped is to handle configuration change.  Essentially, if a
        blob is stored in a delegate blob store, and then the configuration for
        that delegate changes so that the blob wouldn’t be stored there if it 
was
        being written now, we want to be able to locate it during the time 
between
        when the configuration change happens and some background curator moves 
the
        blob to the correct location.
        
        
        So in short, I’d do the default implementation as described, but a
        different implementation could be injected instead, if someone wanted a
        more custom one.
        
        
        WDYT?
        
        
        -MR
        
    
    

Reply via email to