I put together a very crude initial POC which can be seen at [0]. This simply allows a FileDataStore to be used as a delegate data store and the FederatedDataStore to be used in Oak as the primary data store.
The approach is simply that the FederatedDataStore has information about the delegates (one primary and zero or more secondaries) and can defer all actions to the appropriate delegate. The goal of this POC was to determine if this simple idea could possibly work. I'm simply doing an internal mapping from a simple data store name to a fully qualified class name, and then using reflection to create the data store. This prevents coupling between the FederatedDataStore and other data stores but also limits it to only work with supported data store delegates. One question I have with this has to do with basic correctness of approach. Is it acceptable to create the data store objects directly (e.g. OakCachingFDS), or should the service be going through OSGi to create other data store service objects instead (e.g. FileDataStoreService)? I have a concern that creating service objects may mean OSGi limits me to a single service, whereas if we create the data store objects directly we could have a number of them. For example, multiple S3DataStore objects, each with a different bucket for different purposes. But I'm not sure if that limitation on service objects really exists. Thoughts? [0] - https://github.com/mattvryan/jackrabbit-oak/tree/federated-data-store/oak-blob-federated/src/main/java/org/apache/jackrabbit/oak/blob/federated -MR On Thu, Apr 20, 2017 at 12:20 PM, Matt Ryan <[email protected]> wrote: > Hi, > > I'm looking at the possibility of creating a new kind of data store, let's > call it a federated data store, and wanted to see what everyone thinks > about this. > > The basic idea is that the federated data store would allow for more than > one data store to be configured for an Oak instance. Oak would then be > able to choose which data store to use based on a number of criteria, like > file size, JCR path, node type, existence of a node property, a node > property value, or other items, or a combination of items. In my thinking > these are defined in configuration so the federated data store would know > how to select which data store is used to store which binary. > > I think this is a step towards UC14 - Hierarchical BlobStore in [0]. Once > the federated data store was implemented we should be able to support UC14 > with little work. I can also foresee other possible capabilities it could > offer, such as storing blobs for different node types in different data > stores, or choosing from a few different data stores based on geographic > location (UC2 in [0]). > > In my mind we could add capability to DataStoreBlobStore.writeStream() > where the decision is made whether to write a stream to the data store > delegate or put it in-memory. Instead we could defer the decision directly > to the delegate, adding a method to the appropriate interface (BlobStore or > GarbageCollectibleBlobStore) to handle this decision, and default the > decision in AbstractBlobStore to be based on the record size (which is the > current behavior, except currently that decision is made in > DataStoreBlobStore IIUC). All other existing data stores should then > behave the same. But in the case of the federated data store this decision > would be more involved, selecting the right data store based on > configuration. > > The federated data store would need to exist independent of other data > stores, so figuring out how to create those data stores without having a > code dependency would be a challenge to figure out. > > > Please let me know what you think, is my idea about the implementation > flawed, is there a better way to accomplish this, what concerns are there > about it, etc. I'd like to brainstorm with the list something that can > work in this area and then I'll create a ticket for it. Or I can create > the ticket, and we can have the discussion in the ticket. Let me know > which is best. > > > [0] - https://wiki.apache.org/jackrabbit/JCR%20Binary%20Usecase > > > - Matt Ryan >
