Hi,

OakFileDataStore is an extension of the JR2 FileDataStore and implements
required methods to work in Oak to support DSGC etc. So, in Oak
OakFileDataStore should only be used.

Thanks
Amit

On Tue, Feb 25, 2020 at 6:01 AM Marco Piovesana <pioves...@esteco.com>
wrote:

> Hi guys,
> what's the difference between the FileDataStore and the OakFileDataStore?
> I've seen that the one is an extension of the other and it implements the
> SharedDataStore interface, but I did not found other documentation on it.
> Is it just the oak implementation of the same storage? Or there are cases
> where one should be used instead of the other?
>
> Marco.
>
> On Mon, Feb 24, 2020 at 6:43 PM Amit Jain <am...@ieee.org> wrote:
>
> > Hi,
> >
> > CachingFileDataStore is only a sort of wrapper to cache files locally
> (and
> > upload async) when the actual backend is some sort of NFS and is slow for
> > the parameters you care about. OakFileDataStore is what'll work for your
> > purpose if you don't care about local caching.
> >
> > >> it feels like this some info of this thread should be in the online
> > documentation
> > Feel free to create a patch to update the documentation at
> > https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html with what
> is
> > missing.
> >
> > Thanks
> > Amit
> >
> > On Sun, Feb 23, 2020 at 4:00 AM jorgeeflorez . <
> > jorgeeduardoflo...@gmail.com>
> > wrote:
> >
> > > Hi Matt,
> > >
> > > Just be sure that any Oak instances sharing the same file location
> belong
> > > > to the same logical cluster.
> > > >
> > > > Sharing the same file location between multiple logical instances
> > should
> > > > "work", but certain capabilities like data store GC won't work well
> in
> > > that
> > > > scenario.
> > > >
> > > > That doesn't mean you need a separate file server for each Oak
> cluster
> > > > though.  One location per cluster should work fine - they could be
> > > > different shares on the same server, or even different folders in the
> > > same
> > > > share.
> > >
> > >
> > > I am not sure if I am understanding you. I will have a different
> > directory
> > > for each repository and all Oak instances for the same repository will
> > use
> > > that directory as File Store. Each instance will have its own
> clusterId.
> > >
> > > One question though - you said one customer has servers in Amazon (I
> > assume
> > > > EC2).  Where are they planning to store their binaries - in file
> > storage
> > > > mounted by the VM or in S3?  They may wish to consider using an S3
> > bucket
> > > > instead and using S3DataStore - might cost less.
> > > >
> > >
> > > Yes, they have EC2 servers. Initially we had the binaries stored in
> > > MongoDB, of course that is not good. So the idea is to store them in
> the
> > > OS file system, but I think available space could run out quickly. I
> > think
> > > I once suggested using S3 but I am not sure if they want that. I will
> > > mention it again.
> > >
> > > TBH I don't see what caching gives you in this scenario.  The caching
> > > > implementation will maintain a local cache of uploaded and downloaded
> > > > files; the intent would be to improve latency, but caches also always
> > add
> > > > complexity.  With OakFileDataStore the files are already "local"
> > anyway -
> > > > even if across a network I don't know how much the cache buys you in
> > > terms
> > > > of performance.
> > >
> > >
> > > Yes, although it seemed cool when I read and tried it, I think using
> > > CachingFileDataStore could make things a bit more difficult. I hope
> that
> > > with OakFileDataStore be enough.
> > >
> > > Thank you Matt. With your help, I understand this topic a lot more (it
> > > feels like this some info of this thread should be in the online
> > > documentation).
> > >
> > > Best Regards.
> > >
> > > Jorge
> > >
> > > El vie., 21 feb. 2020 a las 18:57, Matt Ryan (<mattr...@apache.org>)
> > > escribió:
> > >
> > > > Hi Jorge,
> > > >
> > > > On Fri, Feb 21, 2020 at 3:40 PM jorgeeflorez . <
> > > > jorgeeduardoflo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Matt, thanks a lot for your answer.
> > > > >
> > > > > If your storage is "local" (meaning it appears as a local
> filesystem
> > to
> > > > > > Oak), I'd probably use OakFileDataStore.  It implements
> > > SharedDataStore
> > > > > so
> > > > > > you can share the same location with multiple instances.  For
> > example
> > > > if
> > > > > > you create a file share on a NAS and then mount that share on
> > > multiple
> > > > > > servers - even though the storage is across the network, it is
> > > mounted
> > > > in
> > > > > > the filesystem and appears local.  OakFileDataStore should work
> > well
> > > > for
> > > > > > this purpose.
> > > > >
> > > > >
> > > > > I think this would be the case: I will have one or more servers,
> each
> > > one
> > > > > with one or more Oak instances (we handle several repositories),
> all
> > > > > "using" the same file store. One customer has those servers in the
> > same
> > > > > intranet and another has them in Amazon. But in both cases I could
> > > mount
> > > > a
> > > > > folder that would be "visible" to all servers, right?
> > > > >
> > > >
> > > > Just be sure that any Oak instances sharing the same file location
> > belong
> > > > to the same logical cluster.
> > > >
> > > > Sharing the same file location between multiple logical instances
> > should
> > > > "work", but certain capabilities like data store GC won't work well
> in
> > > that
> > > > scenario.
> > > >
> > > > That doesn't mean you need a separate file server for each Oak
> cluster
> > > > though.  One location per cluster should work fine - they could be
> > > > different shares on the same server, or even different folders in the
> > > same
> > > > share.
> > > >
> > > > One question though - you said one customer has servers in Amazon (I
> > > assume
> > > > EC2).  Where are they planning to store their binaries - in file
> > storage
> > > > mounted by the VM or in S3?  They may wish to consider using an S3
> > bucket
> > > > instead and using S3DataStore - might cost less.
> > > >
> > > >
> > > >
> > > > >
> > > > > Do you think it would be best to use OakFileDataStore over, for
> > example
> > > > > CachingFileDataStore? to keep things "simple"?
> > > > >
> > > >
> > > > TBH I don't see what caching gives you in this scenario.  The caching
> > > > implementation will maintain a local cache of uploaded and downloaded
> > > > files; the intent would be to improve latency, but caches also always
> > add
> > > > complexity.  With OakFileDataStore the files are already "local"
> > anyway -
> > > > even if across a network I don't know how much the cache buys you in
> > > terms
> > > > of performance.
> > > >
> > > >
> > > >
> > > > >
> > > > > As for DataStoreBlobStore - DataStoreBlobStore is a wrapper around
> a
> > > > class
> > > > > > that implements DataStore to make it look like a BlobStore.
> > > > > >
> > > > > > I have been using something like this to setup my repository, I
> do
> > > not
> > > > > know if there is another way...
> > > > >
> > > > > FileDataStore fds = new FileDataStore();
> > > > > File dir = ...;
> > > > > fds.init(dir.getAbsolutePath());
> > > > > DataStoreBlobStore dsbs = new DataStoreBlobStore(fds);
> > > > > DocumentNodeStore docStore = new MongoDocumentNodeStoreBuilder().
> > > > >                     setMongoDB("mongodb://user:password@" + host +
> > > ":" +
> > > > > port, "repo1", 16).
> > > > >                     setClusterId(123).
> > > > >                     setAsyncDelay(10).
> > > > >                     setBlobStore(dsbs).
> > > > >                     build();
> > > > >
> > > > >
> > > > That looks like the right idea - other than I'd use OakFileDataStore
> > > > instead of FileDataStore.
> > > >
> > > >
> > > > -MR
> > > >
> > >
> >
>

Reply via email to