Re: A federated data store

Chetan Mehrotra Fri, 21 Apr 2017 07:54:57 -0700

Hi Matt,

On Thu, Apr 20, 2017 at 11:50 PM, Matt Ryan <o...@mvryan.org> wrote:
> Oak would then be
> able to choose which data store to use based on a number of criteria, like
> file size, JCR path, node type, existence of a node property, a node
> property value, or other items, or a combination of items.  In my thinking
> these are defined in configuration so the federated data store would know
> how to select which data store is used to store which binary.


This would need some more details. The way a binary gets written using
the JCR API is

1. Code create a Binary using ValueFactory say by spooling the stream.
By this time binary is already added to DataStore
2. The returned binary reference is then stored as part of JCR Node by
setting the passed Binary property.

So to make storage of Binary a function of final Node would require
some more thought. A federated store has 2 aspects

1. Writing a binary - Destination store selection = f(node, path, user option)

2. Reading a binary - This would be simple as the actual store
information would be encoded within the blobId (like some url?) and
then BlobStore which needs to be used for reading should be selected
based on scheme in blobid

Further current Blob related API is used in following ways

B1. Code logic dealing with blob creation - JCR ValueFactory,
NodeStore#createBlob. They only work with BlobStore api
B2. Code logic dealing with BlobGC - It uses methods in
GarbageCollectableBlobStore

Amit added a BlobStore#writeBlob(InputStream, BlobOption) as part of
OAK-5174. This can now be extended to support Federated usecase. One
possible approach can be like below

1. Setup would have multiple BlobStore service implementations registered.
2. These service would have a property "type" defined to indicate the scheme.
3. The setup would have a default BlobStore and multiple secondary stores
4. Any code in #B1 above would be dealing with a FederatedBlobStore
aka the "master"/primary store
5. The NodeStores would be bound to this "master" BlobStore

FederatedBlobStore would use the default store for any Binary created
via NodeStore#createBlob. . However any call to
BlobStore#writeBlob(InputStream, BlobOption) would be passed to other
stored which can indicate if they can handle the call or not. If yes
then they would return the Blob ID. We can also look into exposing the
new method as part of NodeStore API

OakValueFactory can then wrap the "context" i.e. path, node etc as
part of BlobOption which can then be used for store selection.

How this impacts the GC logic would also needs to be thought about.

Chetan Mehrotra

PS: Above is more of a brain dump in thinking out loud mode :)

Re: A federated data store

Reply via email to