Hi Matt,

Just be sure that any Oak instances sharing the same file location belong
> to the same logical cluster.
>
> Sharing the same file location between multiple logical instances should
> "work", but certain capabilities like data store GC won't work well in that
> scenario.
>
> That doesn't mean you need a separate file server for each Oak cluster
> though.  One location per cluster should work fine - they could be
> different shares on the same server, or even different folders in the same
> share.


I am not sure if I am understanding you. I will have a different directory
for each repository and all Oak instances for the same repository will use
that directory as File Store. Each instance will have its own clusterId.

One question though - you said one customer has servers in Amazon (I assume
> EC2).  Where are they planning to store their binaries - in file storage
> mounted by the VM or in S3?  They may wish to consider using an S3 bucket
> instead and using S3DataStore - might cost less.
>

Yes, they have EC2 servers. Initially we had the binaries stored in
MongoDB, of course that is not good. So the idea is to store them in the
OS file system, but I think available space could run out quickly. I think
I once suggested using S3 but I am not sure if they want that. I will
mention it again.

TBH I don't see what caching gives you in this scenario.  The caching
> implementation will maintain a local cache of uploaded and downloaded
> files; the intent would be to improve latency, but caches also always add
> complexity.  With OakFileDataStore the files are already "local" anyway -
> even if across a network I don't know how much the cache buys you in terms
> of performance.


Yes, although it seemed cool when I read and tried it, I think using
CachingFileDataStore could make things a bit more difficult. I hope that
with OakFileDataStore be enough.

Thank you Matt. With your help, I understand this topic a lot more (it
feels like this some info of this thread should be in the online
documentation).

Best Regards.

Jorge

El vie., 21 feb. 2020 a las 18:57, Matt Ryan (<mattr...@apache.org>)
escribió:

> Hi Jorge,
>
> On Fri, Feb 21, 2020 at 3:40 PM jorgeeflorez . <
> jorgeeduardoflo...@gmail.com>
> wrote:
>
> > Hi Matt, thanks a lot for your answer.
> >
> > If your storage is "local" (meaning it appears as a local filesystem to
> > > Oak), I'd probably use OakFileDataStore.  It implements SharedDataStore
> > so
> > > you can share the same location with multiple instances.  For example
> if
> > > you create a file share on a NAS and then mount that share on multiple
> > > servers - even though the storage is across the network, it is mounted
> in
> > > the filesystem and appears local.  OakFileDataStore should work well
> for
> > > this purpose.
> >
> >
> > I think this would be the case: I will have one or more servers, each one
> > with one or more Oak instances (we handle several repositories), all
> > "using" the same file store. One customer has those servers in the same
> > intranet and another has them in Amazon. But in both cases I could mount
> a
> > folder that would be "visible" to all servers, right?
> >
>
> Just be sure that any Oak instances sharing the same file location belong
> to the same logical cluster.
>
> Sharing the same file location between multiple logical instances should
> "work", but certain capabilities like data store GC won't work well in that
> scenario.
>
> That doesn't mean you need a separate file server for each Oak cluster
> though.  One location per cluster should work fine - they could be
> different shares on the same server, or even different folders in the same
> share.
>
> One question though - you said one customer has servers in Amazon (I assume
> EC2).  Where are they planning to store their binaries - in file storage
> mounted by the VM or in S3?  They may wish to consider using an S3 bucket
> instead and using S3DataStore - might cost less.
>
>
>
> >
> > Do you think it would be best to use OakFileDataStore over, for example
> > CachingFileDataStore? to keep things "simple"?
> >
>
> TBH I don't see what caching gives you in this scenario.  The caching
> implementation will maintain a local cache of uploaded and downloaded
> files; the intent would be to improve latency, but caches also always add
> complexity.  With OakFileDataStore the files are already "local" anyway -
> even if across a network I don't know how much the cache buys you in terms
> of performance.
>
>
>
> >
> > As for DataStoreBlobStore - DataStoreBlobStore is a wrapper around a
> class
> > > that implements DataStore to make it look like a BlobStore.
> > >
> > > I have been using something like this to setup my repository, I do not
> > know if there is another way...
> >
> > FileDataStore fds = new FileDataStore();
> > File dir = ...;
> > fds.init(dir.getAbsolutePath());
> > DataStoreBlobStore dsbs = new DataStoreBlobStore(fds);
> > DocumentNodeStore docStore = new MongoDocumentNodeStoreBuilder().
> >                     setMongoDB("mongodb://user:password@" + host + ":" +
> > port, "repo1", 16).
> >                     setClusterId(123).
> >                     setAsyncDelay(10).
> >                     setBlobStore(dsbs).
> >                     build();
> >
> >
> That looks like the right idea - other than I'd use OakFileDataStore
> instead of FileDataStore.
>
>
> -MR
>

Reply via email to