Hi,

I read the proposals and I have a comment wrt the Overlay/Composite
DataStores at a high level.

IIUC, these 2 are similar except that the Overlay could have duplicated
binaries right? After reading the details around the two it seems to me
Overlay is sort of a super set of the 2 and whether the composed DataStore
have duplicated set can be an implementation detail. For e.g. as a fallback
pluggable/configurable mechanism in case there are no rules/properties
defined while storing the blobs in the DataStore (rules/properties which
can influence the decision of where to redirect the blobs).

With this in mind can't we just conceptually have a Composite DataStore
(Not drilling down to interface/class hierarchy and API yet) which can then
support the following:
* User provided "type" of blob to influence the logical/physical DataStore
the blob/file to be written to (Needs some sort of configuration to have
some pre-defined types with mapping to the datastores).
* Optionally defining characteristic of the DataStore(s)
** read/write
** storage class - slow,fast
** priority

In fact another way to look at them might just be as having an
explicit/implicit decision making for storage of blobs. In which case there
will be occasions where certain blobs are explicitly being written to the
most preferred storage class configured as an example lucene blobs. Also, I
think a lot of administrative things would be same for both e.g.
moving/copying, garbage collection etc.

Also, wrt to the usage of the Overlay for UC9, this is still possible if we
map the cache directories to be on NFS. But do you any tests which show
that this could be a preferred option or the impact on performance? We
didn't give much thought to it but it looked like this may degrade
performance as writing to NFS would be slower and in fact we have a
CachingDataStore option implemented for FileDataStore configured on NFS to
improve performance.

Thanks
Amit

On Wed, Jul 26, 2017 at 5:50 AM, Matt Ryan <o...@mvryan.org> wrote:

> Hi oak-dev,
>
> I’ve written up some proposals on the wiki for blob stores that can
> reference multiple blob storage locations.
>
> Both act as a single logical blob store to Oak and can be treated as a
> single blob store.  Both have at least two “delegate” blob stores managed
> by the primary blob store.
>
> There are two concepts.  One I’m currently calling the Overlay blob store
> (we haven’t voted on this name yet).  In this case, delegates are
> configured with a preferred order of lookup.  When a read is issued, the
> overlay blob store will attempt to satisfy the read by going through the
> delegates in order until one can satisfy the read.  [0]
>
> The second concept is the Composite blob store that was previously
> discussed on-list.  In this case, delegates are configured with rules
> specifying which blobs belong in which delegate, with exactly one delegate
> being specified as the default.  There is only ever exactly one correct
> location for a blob in a composite blob store.  When a read is issued, the
> composite blob store will evaluate the rules to determine which delegate
> should be able to satisfy the request, and then read from that delegate
> only, or fail if it is not found in the delegate.  [1]
>
> As I thought about all the use cases, these two concepts kind of stood out
> in contrast to each other so I thought I would propose that we formalize
> the two as separate but similar concepts.
>
> I would appreciate feedback and discussion on how we can make these useful
> for future Oak versions.  Thanks!
>
>
> -MR
>
>
> [0] - https://wiki.apache.org/jackrabbit/Overlay%20Blob%20Store
> [1] - https://wiki.apache.org/jackrabbit/Composite%20Blob%20Store
>

Reply via email to