Re: CachingFileDataStore vs DataStoreBlobStore

2020-02-24 Thread Amit Jain
Hi,

Yes you should be using OakFileDataStore. You should be able to just
instantiate with this.

Thanks
Amit

On Tue, Feb 25, 2020 at 12:04 PM Marco Piovesana 
wrote:

> Hi Amit, thanks for the clarification. Right now I'm using Oak with the
> FileDataStore wrapped into a DataStoreBlobStore. Should I change it to
> OakFileDataStore then? If yes, do I have to do an upgrade for that or I
> just need to instantiate my storage with the new class?
>
> Marco.
>
> On Tue, Feb 25, 2020 at 11:52 AM Amit Jain  wrote:
>
> > Hi,
> >
> > OakFileDataStore is an extension of the JR2 FileDataStore and implements
> > required methods to work in Oak to support DSGC etc. So, in Oak
> > OakFileDataStore should only be used.
> >
> > Thanks
> > Amit
> >
> > On Tue, Feb 25, 2020 at 6:01 AM Marco Piovesana 
> > wrote:
> >
> > > Hi guys,
> > > what's the difference between the FileDataStore and the
> OakFileDataStore?
> > > I've seen that the one is an extension of the other and it implements
> the
> > > SharedDataStore interface, but I did not found other documentation on
> it.
> > > Is it just the oak implementation of the same storage? Or there are
> cases
> > > where one should be used instead of the other?
> > >
> > > Marco.
> > >
> > > On Mon, Feb 24, 2020 at 6:43 PM Amit Jain  wrote:
> > >
> > > > Hi,
> > > >
> > > > CachingFileDataStore is only a sort of wrapper to cache files locally
> > > (and
> > > > upload async) when the actual backend is some sort of NFS and is slow
> > for
> > > > the parameters you care about. OakFileDataStore is what'll work for
> > your
> > > > purpose if you don't care about local caching.
> > > >
> > > > >> it feels like this some info of this thread should be in the
> online
> > > > documentation
> > > > Feel free to create a patch to update the documentation at
> > > > https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html with
> > what
> > > is
> > > > missing.
> > > >
> > > > Thanks
> > > > Amit
> > > >
> > > > On Sun, Feb 23, 2020 at 4:00 AM jorgeeflorez . <
> > > > jorgeeduardoflo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Matt,
> > > > >
> > > > > Just be sure that any Oak instances sharing the same file location
> > > belong
> > > > > > to the same logical cluster.
> > > > > >
> > > > > > Sharing the same file location between multiple logical instances
> > > > should
> > > > > > "work", but certain capabilities like data store GC won't work
> well
> > > in
> > > > > that
> > > > > > scenario.
> > > > > >
> > > > > > That doesn't mean you need a separate file server for each Oak
> > > cluster
> > > > > > though.  One location per cluster should work fine - they could
> be
> > > > > > different shares on the same server, or even different folders in
> > the
> > > > > same
> > > > > > share.
> > > > >
> > > > >
> > > > > I am not sure if I am understanding you. I will have a different
> > > > directory
> > > > > for each repository and all Oak instances for the same repository
> > will
> > > > use
> > > > > that directory as File Store. Each instance will have its own
> > > clusterId.
> > > > >
> > > > > One question though - you said one customer has servers in Amazon
> (I
> > > > assume
> > > > > > EC2).  Where are they planning to store their binaries - in file
> > > > storage
> > > > > > mounted by the VM or in S3?  They may wish to consider using an
> S3
> > > > bucket
> > > > > > instead and using S3DataStore - might cost less.
> > > > > >
> > > > >
> > > > > Yes, they have EC2 servers. Initially we had the binaries stored in
> > > > > MongoDB, of course that is not good. So the idea is to store them
> in
> > > the
> > > > > OS file system, but I think available space could run out quickly.
> I
> > > > think
> > > > > I once suggested using S3 but I am not sure if they want that. I
> will
> > > > > mention it again.
> > > > >
> > > > > TBH I don't see what caching gives you in this scenario.  The
> caching
> > > > > > implementation will maintain a local cache of uploaded and
> > downloaded
> > > > > > files; the intent would be to improve latency, but caches also
> > always
> > > > add
> > > > > > complexity.  With OakFileDataStore the files are already "local"
> > > > anyway -
> > > > > > even if across a network I don't know how much the cache buys you
> > in
> > > > > terms
> > > > > > of performance.
> > > > >
> > > > >
> > > > > Yes, although it seemed cool when I read and tried it, I think
> using
> > > > > CachingFileDataStore could make things a bit more difficult. I hope
> > > that
> > > > > with OakFileDataStore be enough.
> > > > >
> > > > > Thank you Matt. With your help, I understand this topic a lot more
> > (it
> > > > > feels like this some info of this thread should be in the online
> > > > > documentation).
> > > > >
> > > > > Best Regards.
> > > > >
> > > > > Jorge
> > > > >
> > > > > El vie., 21 feb. 2020 a las 18:57, Matt Ryan ( >)
> > > > > escribió:
> > > > >
> > > > > > Hi Jorge,
> > > > > >
> > > > > > On Fri, Feb 21, 2020 at 3:40 PM 

Re: CachingFileDataStore vs DataStoreBlobStore

2020-02-24 Thread Marco Piovesana
Hi Amit, thanks for the clarification. Right now I'm using Oak with the
FileDataStore wrapped into a DataStoreBlobStore. Should I change it to
OakFileDataStore then? If yes, do I have to do an upgrade for that or I
just need to instantiate my storage with the new class?

Marco.

On Tue, Feb 25, 2020 at 11:52 AM Amit Jain  wrote:

> Hi,
>
> OakFileDataStore is an extension of the JR2 FileDataStore and implements
> required methods to work in Oak to support DSGC etc. So, in Oak
> OakFileDataStore should only be used.
>
> Thanks
> Amit
>
> On Tue, Feb 25, 2020 at 6:01 AM Marco Piovesana 
> wrote:
>
> > Hi guys,
> > what's the difference between the FileDataStore and the OakFileDataStore?
> > I've seen that the one is an extension of the other and it implements the
> > SharedDataStore interface, but I did not found other documentation on it.
> > Is it just the oak implementation of the same storage? Or there are cases
> > where one should be used instead of the other?
> >
> > Marco.
> >
> > On Mon, Feb 24, 2020 at 6:43 PM Amit Jain  wrote:
> >
> > > Hi,
> > >
> > > CachingFileDataStore is only a sort of wrapper to cache files locally
> > (and
> > > upload async) when the actual backend is some sort of NFS and is slow
> for
> > > the parameters you care about. OakFileDataStore is what'll work for
> your
> > > purpose if you don't care about local caching.
> > >
> > > >> it feels like this some info of this thread should be in the online
> > > documentation
> > > Feel free to create a patch to update the documentation at
> > > https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html with
> what
> > is
> > > missing.
> > >
> > > Thanks
> > > Amit
> > >
> > > On Sun, Feb 23, 2020 at 4:00 AM jorgeeflorez . <
> > > jorgeeduardoflo...@gmail.com>
> > > wrote:
> > >
> > > > Hi Matt,
> > > >
> > > > Just be sure that any Oak instances sharing the same file location
> > belong
> > > > > to the same logical cluster.
> > > > >
> > > > > Sharing the same file location between multiple logical instances
> > > should
> > > > > "work", but certain capabilities like data store GC won't work well
> > in
> > > > that
> > > > > scenario.
> > > > >
> > > > > That doesn't mean you need a separate file server for each Oak
> > cluster
> > > > > though.  One location per cluster should work fine - they could be
> > > > > different shares on the same server, or even different folders in
> the
> > > > same
> > > > > share.
> > > >
> > > >
> > > > I am not sure if I am understanding you. I will have a different
> > > directory
> > > > for each repository and all Oak instances for the same repository
> will
> > > use
> > > > that directory as File Store. Each instance will have its own
> > clusterId.
> > > >
> > > > One question though - you said one customer has servers in Amazon (I
> > > assume
> > > > > EC2).  Where are they planning to store their binaries - in file
> > > storage
> > > > > mounted by the VM or in S3?  They may wish to consider using an S3
> > > bucket
> > > > > instead and using S3DataStore - might cost less.
> > > > >
> > > >
> > > > Yes, they have EC2 servers. Initially we had the binaries stored in
> > > > MongoDB, of course that is not good. So the idea is to store them in
> > the
> > > > OS file system, but I think available space could run out quickly. I
> > > think
> > > > I once suggested using S3 but I am not sure if they want that. I will
> > > > mention it again.
> > > >
> > > > TBH I don't see what caching gives you in this scenario.  The caching
> > > > > implementation will maintain a local cache of uploaded and
> downloaded
> > > > > files; the intent would be to improve latency, but caches also
> always
> > > add
> > > > > complexity.  With OakFileDataStore the files are already "local"
> > > anyway -
> > > > > even if across a network I don't know how much the cache buys you
> in
> > > > terms
> > > > > of performance.
> > > >
> > > >
> > > > Yes, although it seemed cool when I read and tried it, I think using
> > > > CachingFileDataStore could make things a bit more difficult. I hope
> > that
> > > > with OakFileDataStore be enough.
> > > >
> > > > Thank you Matt. With your help, I understand this topic a lot more
> (it
> > > > feels like this some info of this thread should be in the online
> > > > documentation).
> > > >
> > > > Best Regards.
> > > >
> > > > Jorge
> > > >
> > > > El vie., 21 feb. 2020 a las 18:57, Matt Ryan ()
> > > > escribió:
> > > >
> > > > > Hi Jorge,
> > > > >
> > > > > On Fri, Feb 21, 2020 at 3:40 PM jorgeeflorez . <
> > > > > jorgeeduardoflo...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Matt, thanks a lot for your answer.
> > > > > >
> > > > > > If your storage is "local" (meaning it appears as a local
> > filesystem
> > > to
> > > > > > > Oak), I'd probably use OakFileDataStore.  It implements
> > > > SharedDataStore
> > > > > > so
> > > > > > > you can share the same location with multiple instances.  For
> > > example
> > > > > if
> > > > > > > you create a 

Re: CachingFileDataStore vs DataStoreBlobStore

2020-02-24 Thread Amit Jain
Hi,

OakFileDataStore is an extension of the JR2 FileDataStore and implements
required methods to work in Oak to support DSGC etc. So, in Oak
OakFileDataStore should only be used.

Thanks
Amit

On Tue, Feb 25, 2020 at 6:01 AM Marco Piovesana 
wrote:

> Hi guys,
> what's the difference between the FileDataStore and the OakFileDataStore?
> I've seen that the one is an extension of the other and it implements the
> SharedDataStore interface, but I did not found other documentation on it.
> Is it just the oak implementation of the same storage? Or there are cases
> where one should be used instead of the other?
>
> Marco.
>
> On Mon, Feb 24, 2020 at 6:43 PM Amit Jain  wrote:
>
> > Hi,
> >
> > CachingFileDataStore is only a sort of wrapper to cache files locally
> (and
> > upload async) when the actual backend is some sort of NFS and is slow for
> > the parameters you care about. OakFileDataStore is what'll work for your
> > purpose if you don't care about local caching.
> >
> > >> it feels like this some info of this thread should be in the online
> > documentation
> > Feel free to create a patch to update the documentation at
> > https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html with what
> is
> > missing.
> >
> > Thanks
> > Amit
> >
> > On Sun, Feb 23, 2020 at 4:00 AM jorgeeflorez . <
> > jorgeeduardoflo...@gmail.com>
> > wrote:
> >
> > > Hi Matt,
> > >
> > > Just be sure that any Oak instances sharing the same file location
> belong
> > > > to the same logical cluster.
> > > >
> > > > Sharing the same file location between multiple logical instances
> > should
> > > > "work", but certain capabilities like data store GC won't work well
> in
> > > that
> > > > scenario.
> > > >
> > > > That doesn't mean you need a separate file server for each Oak
> cluster
> > > > though.  One location per cluster should work fine - they could be
> > > > different shares on the same server, or even different folders in the
> > > same
> > > > share.
> > >
> > >
> > > I am not sure if I am understanding you. I will have a different
> > directory
> > > for each repository and all Oak instances for the same repository will
> > use
> > > that directory as File Store. Each instance will have its own
> clusterId.
> > >
> > > One question though - you said one customer has servers in Amazon (I
> > assume
> > > > EC2).  Where are they planning to store their binaries - in file
> > storage
> > > > mounted by the VM or in S3?  They may wish to consider using an S3
> > bucket
> > > > instead and using S3DataStore - might cost less.
> > > >
> > >
> > > Yes, they have EC2 servers. Initially we had the binaries stored in
> > > MongoDB, of course that is not good. So the idea is to store them in
> the
> > > OS file system, but I think available space could run out quickly. I
> > think
> > > I once suggested using S3 but I am not sure if they want that. I will
> > > mention it again.
> > >
> > > TBH I don't see what caching gives you in this scenario.  The caching
> > > > implementation will maintain a local cache of uploaded and downloaded
> > > > files; the intent would be to improve latency, but caches also always
> > add
> > > > complexity.  With OakFileDataStore the files are already "local"
> > anyway -
> > > > even if across a network I don't know how much the cache buys you in
> > > terms
> > > > of performance.
> > >
> > >
> > > Yes, although it seemed cool when I read and tried it, I think using
> > > CachingFileDataStore could make things a bit more difficult. I hope
> that
> > > with OakFileDataStore be enough.
> > >
> > > Thank you Matt. With your help, I understand this topic a lot more (it
> > > feels like this some info of this thread should be in the online
> > > documentation).
> > >
> > > Best Regards.
> > >
> > > Jorge
> > >
> > > El vie., 21 feb. 2020 a las 18:57, Matt Ryan ()
> > > escribió:
> > >
> > > > Hi Jorge,
> > > >
> > > > On Fri, Feb 21, 2020 at 3:40 PM jorgeeflorez . <
> > > > jorgeeduardoflo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Matt, thanks a lot for your answer.
> > > > >
> > > > > If your storage is "local" (meaning it appears as a local
> filesystem
> > to
> > > > > > Oak), I'd probably use OakFileDataStore.  It implements
> > > SharedDataStore
> > > > > so
> > > > > > you can share the same location with multiple instances.  For
> > example
> > > > if
> > > > > > you create a file share on a NAS and then mount that share on
> > > multiple
> > > > > > servers - even though the storage is across the network, it is
> > > mounted
> > > > in
> > > > > > the filesystem and appears local.  OakFileDataStore should work
> > well
> > > > for
> > > > > > this purpose.
> > > > >
> > > > >
> > > > > I think this would be the case: I will have one or more servers,
> each
> > > one
> > > > > with one or more Oak instances (we handle several repositories),
> all
> > > > > "using" the same file store. One customer has those servers in the
> > same
> > > > > intranet and another has them in 

Re: CachingFileDataStore vs DataStoreBlobStore

2020-02-24 Thread Marco Piovesana
Hi guys,
what's the difference between the FileDataStore and the OakFileDataStore?
I've seen that the one is an extension of the other and it implements the
SharedDataStore interface, but I did not found other documentation on it.
Is it just the oak implementation of the same storage? Or there are cases
where one should be used instead of the other?

Marco.

On Mon, Feb 24, 2020 at 6:43 PM Amit Jain  wrote:

> Hi,
>
> CachingFileDataStore is only a sort of wrapper to cache files locally (and
> upload async) when the actual backend is some sort of NFS and is slow for
> the parameters you care about. OakFileDataStore is what'll work for your
> purpose if you don't care about local caching.
>
> >> it feels like this some info of this thread should be in the online
> documentation
> Feel free to create a patch to update the documentation at
> https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html with what is
> missing.
>
> Thanks
> Amit
>
> On Sun, Feb 23, 2020 at 4:00 AM jorgeeflorez . <
> jorgeeduardoflo...@gmail.com>
> wrote:
>
> > Hi Matt,
> >
> > Just be sure that any Oak instances sharing the same file location belong
> > > to the same logical cluster.
> > >
> > > Sharing the same file location between multiple logical instances
> should
> > > "work", but certain capabilities like data store GC won't work well in
> > that
> > > scenario.
> > >
> > > That doesn't mean you need a separate file server for each Oak cluster
> > > though.  One location per cluster should work fine - they could be
> > > different shares on the same server, or even different folders in the
> > same
> > > share.
> >
> >
> > I am not sure if I am understanding you. I will have a different
> directory
> > for each repository and all Oak instances for the same repository will
> use
> > that directory as File Store. Each instance will have its own clusterId.
> >
> > One question though - you said one customer has servers in Amazon (I
> assume
> > > EC2).  Where are they planning to store their binaries - in file
> storage
> > > mounted by the VM or in S3?  They may wish to consider using an S3
> bucket
> > > instead and using S3DataStore - might cost less.
> > >
> >
> > Yes, they have EC2 servers. Initially we had the binaries stored in
> > MongoDB, of course that is not good. So the idea is to store them in the
> > OS file system, but I think available space could run out quickly. I
> think
> > I once suggested using S3 but I am not sure if they want that. I will
> > mention it again.
> >
> > TBH I don't see what caching gives you in this scenario.  The caching
> > > implementation will maintain a local cache of uploaded and downloaded
> > > files; the intent would be to improve latency, but caches also always
> add
> > > complexity.  With OakFileDataStore the files are already "local"
> anyway -
> > > even if across a network I don't know how much the cache buys you in
> > terms
> > > of performance.
> >
> >
> > Yes, although it seemed cool when I read and tried it, I think using
> > CachingFileDataStore could make things a bit more difficult. I hope that
> > with OakFileDataStore be enough.
> >
> > Thank you Matt. With your help, I understand this topic a lot more (it
> > feels like this some info of this thread should be in the online
> > documentation).
> >
> > Best Regards.
> >
> > Jorge
> >
> > El vie., 21 feb. 2020 a las 18:57, Matt Ryan ()
> > escribió:
> >
> > > Hi Jorge,
> > >
> > > On Fri, Feb 21, 2020 at 3:40 PM jorgeeflorez . <
> > > jorgeeduardoflo...@gmail.com>
> > > wrote:
> > >
> > > > Hi Matt, thanks a lot for your answer.
> > > >
> > > > If your storage is "local" (meaning it appears as a local filesystem
> to
> > > > > Oak), I'd probably use OakFileDataStore.  It implements
> > SharedDataStore
> > > > so
> > > > > you can share the same location with multiple instances.  For
> example
> > > if
> > > > > you create a file share on a NAS and then mount that share on
> > multiple
> > > > > servers - even though the storage is across the network, it is
> > mounted
> > > in
> > > > > the filesystem and appears local.  OakFileDataStore should work
> well
> > > for
> > > > > this purpose.
> > > >
> > > >
> > > > I think this would be the case: I will have one or more servers, each
> > one
> > > > with one or more Oak instances (we handle several repositories), all
> > > > "using" the same file store. One customer has those servers in the
> same
> > > > intranet and another has them in Amazon. But in both cases I could
> > mount
> > > a
> > > > folder that would be "visible" to all servers, right?
> > > >
> > >
> > > Just be sure that any Oak instances sharing the same file location
> belong
> > > to the same logical cluster.
> > >
> > > Sharing the same file location between multiple logical instances
> should
> > > "work", but certain capabilities like data store GC won't work well in
> > that
> > > scenario.
> > >
> > > That doesn't mean you need a separate file server for each Oak cluster
> > > though.  One 

Re: CachingFileDataStore vs DataStoreBlobStore

2020-02-24 Thread Amit Jain
Hi,

CachingFileDataStore is only a sort of wrapper to cache files locally (and
upload async) when the actual backend is some sort of NFS and is slow for
the parameters you care about. OakFileDataStore is what'll work for your
purpose if you don't care about local caching.

>> it feels like this some info of this thread should be in the online
documentation
Feel free to create a patch to update the documentation at
https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html with what is
missing.

Thanks
Amit

On Sun, Feb 23, 2020 at 4:00 AM jorgeeflorez . 
wrote:

> Hi Matt,
>
> Just be sure that any Oak instances sharing the same file location belong
> > to the same logical cluster.
> >
> > Sharing the same file location between multiple logical instances should
> > "work", but certain capabilities like data store GC won't work well in
> that
> > scenario.
> >
> > That doesn't mean you need a separate file server for each Oak cluster
> > though.  One location per cluster should work fine - they could be
> > different shares on the same server, or even different folders in the
> same
> > share.
>
>
> I am not sure if I am understanding you. I will have a different directory
> for each repository and all Oak instances for the same repository will use
> that directory as File Store. Each instance will have its own clusterId.
>
> One question though - you said one customer has servers in Amazon (I assume
> > EC2).  Where are they planning to store their binaries - in file storage
> > mounted by the VM or in S3?  They may wish to consider using an S3 bucket
> > instead and using S3DataStore - might cost less.
> >
>
> Yes, they have EC2 servers. Initially we had the binaries stored in
> MongoDB, of course that is not good. So the idea is to store them in the
> OS file system, but I think available space could run out quickly. I think
> I once suggested using S3 but I am not sure if they want that. I will
> mention it again.
>
> TBH I don't see what caching gives you in this scenario.  The caching
> > implementation will maintain a local cache of uploaded and downloaded
> > files; the intent would be to improve latency, but caches also always add
> > complexity.  With OakFileDataStore the files are already "local" anyway -
> > even if across a network I don't know how much the cache buys you in
> terms
> > of performance.
>
>
> Yes, although it seemed cool when I read and tried it, I think using
> CachingFileDataStore could make things a bit more difficult. I hope that
> with OakFileDataStore be enough.
>
> Thank you Matt. With your help, I understand this topic a lot more (it
> feels like this some info of this thread should be in the online
> documentation).
>
> Best Regards.
>
> Jorge
>
> El vie., 21 feb. 2020 a las 18:57, Matt Ryan ()
> escribió:
>
> > Hi Jorge,
> >
> > On Fri, Feb 21, 2020 at 3:40 PM jorgeeflorez . <
> > jorgeeduardoflo...@gmail.com>
> > wrote:
> >
> > > Hi Matt, thanks a lot for your answer.
> > >
> > > If your storage is "local" (meaning it appears as a local filesystem to
> > > > Oak), I'd probably use OakFileDataStore.  It implements
> SharedDataStore
> > > so
> > > > you can share the same location with multiple instances.  For example
> > if
> > > > you create a file share on a NAS and then mount that share on
> multiple
> > > > servers - even though the storage is across the network, it is
> mounted
> > in
> > > > the filesystem and appears local.  OakFileDataStore should work well
> > for
> > > > this purpose.
> > >
> > >
> > > I think this would be the case: I will have one or more servers, each
> one
> > > with one or more Oak instances (we handle several repositories), all
> > > "using" the same file store. One customer has those servers in the same
> > > intranet and another has them in Amazon. But in both cases I could
> mount
> > a
> > > folder that would be "visible" to all servers, right?
> > >
> >
> > Just be sure that any Oak instances sharing the same file location belong
> > to the same logical cluster.
> >
> > Sharing the same file location between multiple logical instances should
> > "work", but certain capabilities like data store GC won't work well in
> that
> > scenario.
> >
> > That doesn't mean you need a separate file server for each Oak cluster
> > though.  One location per cluster should work fine - they could be
> > different shares on the same server, or even different folders in the
> same
> > share.
> >
> > One question though - you said one customer has servers in Amazon (I
> assume
> > EC2).  Where are they planning to store their binaries - in file storage
> > mounted by the VM or in S3?  They may wish to consider using an S3 bucket
> > instead and using S3DataStore - might cost less.
> >
> >
> >
> > >
> > > Do you think it would be best to use OakFileDataStore over, for example
> > > CachingFileDataStore? to keep things "simple"?
> > >
> >
> > TBH I don't see what caching gives you in this scenario.  The caching
> > implementation will maintain a 

Re: CachingFileDataStore vs DataStoreBlobStore

2020-02-22 Thread jorgeeflorez .
Hi Matt,

Just be sure that any Oak instances sharing the same file location belong
> to the same logical cluster.
>
> Sharing the same file location between multiple logical instances should
> "work", but certain capabilities like data store GC won't work well in that
> scenario.
>
> That doesn't mean you need a separate file server for each Oak cluster
> though.  One location per cluster should work fine - they could be
> different shares on the same server, or even different folders in the same
> share.


I am not sure if I am understanding you. I will have a different directory
for each repository and all Oak instances for the same repository will use
that directory as File Store. Each instance will have its own clusterId.

One question though - you said one customer has servers in Amazon (I assume
> EC2).  Where are they planning to store their binaries - in file storage
> mounted by the VM or in S3?  They may wish to consider using an S3 bucket
> instead and using S3DataStore - might cost less.
>

Yes, they have EC2 servers. Initially we had the binaries stored in
MongoDB, of course that is not good. So the idea is to store them in the
OS file system, but I think available space could run out quickly. I think
I once suggested using S3 but I am not sure if they want that. I will
mention it again.

TBH I don't see what caching gives you in this scenario.  The caching
> implementation will maintain a local cache of uploaded and downloaded
> files; the intent would be to improve latency, but caches also always add
> complexity.  With OakFileDataStore the files are already "local" anyway -
> even if across a network I don't know how much the cache buys you in terms
> of performance.


Yes, although it seemed cool when I read and tried it, I think using
CachingFileDataStore could make things a bit more difficult. I hope that
with OakFileDataStore be enough.

Thank you Matt. With your help, I understand this topic a lot more (it
feels like this some info of this thread should be in the online
documentation).

Best Regards.

Jorge

El vie., 21 feb. 2020 a las 18:57, Matt Ryan ()
escribió:

> Hi Jorge,
>
> On Fri, Feb 21, 2020 at 3:40 PM jorgeeflorez . <
> jorgeeduardoflo...@gmail.com>
> wrote:
>
> > Hi Matt, thanks a lot for your answer.
> >
> > If your storage is "local" (meaning it appears as a local filesystem to
> > > Oak), I'd probably use OakFileDataStore.  It implements SharedDataStore
> > so
> > > you can share the same location with multiple instances.  For example
> if
> > > you create a file share on a NAS and then mount that share on multiple
> > > servers - even though the storage is across the network, it is mounted
> in
> > > the filesystem and appears local.  OakFileDataStore should work well
> for
> > > this purpose.
> >
> >
> > I think this would be the case: I will have one or more servers, each one
> > with one or more Oak instances (we handle several repositories), all
> > "using" the same file store. One customer has those servers in the same
> > intranet and another has them in Amazon. But in both cases I could mount
> a
> > folder that would be "visible" to all servers, right?
> >
>
> Just be sure that any Oak instances sharing the same file location belong
> to the same logical cluster.
>
> Sharing the same file location between multiple logical instances should
> "work", but certain capabilities like data store GC won't work well in that
> scenario.
>
> That doesn't mean you need a separate file server for each Oak cluster
> though.  One location per cluster should work fine - they could be
> different shares on the same server, or even different folders in the same
> share.
>
> One question though - you said one customer has servers in Amazon (I assume
> EC2).  Where are they planning to store their binaries - in file storage
> mounted by the VM or in S3?  They may wish to consider using an S3 bucket
> instead and using S3DataStore - might cost less.
>
>
>
> >
> > Do you think it would be best to use OakFileDataStore over, for example
> > CachingFileDataStore? to keep things "simple"?
> >
>
> TBH I don't see what caching gives you in this scenario.  The caching
> implementation will maintain a local cache of uploaded and downloaded
> files; the intent would be to improve latency, but caches also always add
> complexity.  With OakFileDataStore the files are already "local" anyway -
> even if across a network I don't know how much the cache buys you in terms
> of performance.
>
>
>
> >
> > As for DataStoreBlobStore - DataStoreBlobStore is a wrapper around a
> class
> > > that implements DataStore to make it look like a BlobStore.
> > >
> > > I have been using something like this to setup my repository, I do not
> > know if there is another way...
> >
> > FileDataStore fds = new FileDataStore();
> > File dir = ...;
> > fds.init(dir.getAbsolutePath());
> > DataStoreBlobStore dsbs = new DataStoreBlobStore(fds);
> > DocumentNodeStore docStore = new MongoDocumentNodeStoreBuilder().
> >   

Re: CachingFileDataStore vs DataStoreBlobStore

2020-02-21 Thread Matt Ryan
Hi Jorge,

On Fri, Feb 21, 2020 at 3:40 PM jorgeeflorez . 
wrote:

> Hi Matt, thanks a lot for your answer.
>
> If your storage is "local" (meaning it appears as a local filesystem to
> > Oak), I'd probably use OakFileDataStore.  It implements SharedDataStore
> so
> > you can share the same location with multiple instances.  For example if
> > you create a file share on a NAS and then mount that share on multiple
> > servers - even though the storage is across the network, it is mounted in
> > the filesystem and appears local.  OakFileDataStore should work well for
> > this purpose.
>
>
> I think this would be the case: I will have one or more servers, each one
> with one or more Oak instances (we handle several repositories), all
> "using" the same file store. One customer has those servers in the same
> intranet and another has them in Amazon. But in both cases I could mount a
> folder that would be "visible" to all servers, right?
>

Just be sure that any Oak instances sharing the same file location belong
to the same logical cluster.

Sharing the same file location between multiple logical instances should
"work", but certain capabilities like data store GC won't work well in that
scenario.

That doesn't mean you need a separate file server for each Oak cluster
though.  One location per cluster should work fine - they could be
different shares on the same server, or even different folders in the same
share.

One question though - you said one customer has servers in Amazon (I assume
EC2).  Where are they planning to store their binaries - in file storage
mounted by the VM or in S3?  They may wish to consider using an S3 bucket
instead and using S3DataStore - might cost less.



>
> Do you think it would be best to use OakFileDataStore over, for example
> CachingFileDataStore? to keep things "simple"?
>

TBH I don't see what caching gives you in this scenario.  The caching
implementation will maintain a local cache of uploaded and downloaded
files; the intent would be to improve latency, but caches also always add
complexity.  With OakFileDataStore the files are already "local" anyway -
even if across a network I don't know how much the cache buys you in terms
of performance.



>
> As for DataStoreBlobStore - DataStoreBlobStore is a wrapper around a class
> > that implements DataStore to make it look like a BlobStore.
> >
> > I have been using something like this to setup my repository, I do not
> know if there is another way...
>
> FileDataStore fds = new FileDataStore();
> File dir = ...;
> fds.init(dir.getAbsolutePath());
> DataStoreBlobStore dsbs = new DataStoreBlobStore(fds);
> DocumentNodeStore docStore = new MongoDocumentNodeStoreBuilder().
> setMongoDB("mongodb://user:password@" + host + ":" +
> port, "repo1", 16).
> setClusterId(123).
> setAsyncDelay(10).
> setBlobStore(dsbs).
> build();
>
>
That looks like the right idea - other than I'd use OakFileDataStore
instead of FileDataStore.


-MR


Re: CachingFileDataStore vs DataStoreBlobStore

2020-02-21 Thread jorgeeflorez .
Hi Matt, thanks a lot for your answer.

If your storage is "local" (meaning it appears as a local filesystem to
> Oak), I'd probably use OakFileDataStore.  It implements SharedDataStore so
> you can share the same location with multiple instances.  For example if
> you create a file share on a NAS and then mount that share on multiple
> servers - even though the storage is across the network, it is mounted in
> the filesystem and appears local.  OakFileDataStore should work well for
> this purpose.


I think this would be the case: I will have one or more servers, each one
with one or more Oak instances (we handle several repositories), all
"using" the same file store. One customer has those servers in the same
intranet and another has them in Amazon. But in both cases I could mount a
folder that would be "visible" to all servers, right?

Do you think it would be best to use OakFileDataStore over, for example
CachingFileDataStore? to keep things "simple"?

As for DataStoreBlobStore - DataStoreBlobStore is a wrapper around a class
> that implements DataStore to make it look like a BlobStore.
>
> I have been using something like this to setup my repository, I do not
know if there is another way...

FileDataStore fds = new FileDataStore();
File dir = ...;
fds.init(dir.getAbsolutePath());
DataStoreBlobStore dsbs = new DataStoreBlobStore(fds);
DocumentNodeStore docStore = new MongoDocumentNodeStoreBuilder().
setMongoDB("mongodb://user:password@" + host + ":" +
port, "repo1", 16).
setClusterId(123).
setAsyncDelay(10).
setBlobStore(dsbs).
build();


Jorge


El vie., 21 feb. 2020 a las 16:36, Matt Ryan ()
escribió:

> Hi,
>
> I think I probably will need a bit more information about your use case to
> know how to help you best; can you provide a bit more detail about your
> environment and what you are hoping to accomplish?
>
> If your storage is "local" (meaning it appears as a local filesystem to
> Oak), I'd probably use OakFileDataStore.  It implements SharedDataStore so
> you can share the same location with multiple instances.  For example if
> you create a file share on a NAS and then mount that share on multiple
> servers - even though the storage is across the network, it is mounted in
> the filesystem and appears local.  OakFileDataStore should work well for
> this purpose.
>
> The other common use case for a shared storage location is cloud-based
> storage, like AWS S3.  In this case use S3DataStore (for AWS S3) or
> AzureDataStore (for Microsoft Azure Blob Storage).
>
> Do you have a different use case than one of these?
>
>
> As for DataStoreBlobStore - DataStoreBlobStore is a wrapper around a class
> that implements DataStore to make it look like a BlobStore.  For reasons
> I'm not fully aware of (happened before my time - probably historical),
> binary object storage in Oak is usually available as an implementation of
> the Jackrabbit DataStore interface but Oak interacts with these as
> BlobStores.  You will usually set up your repository something like this:
>DataStore ds = new OakFileDataStore(); // or whatever DataStore type you
> choose
>ds.init(dataStoreHomeDirectory);
>BlobStore blobStore = new DataStoreBlobStore(ds);
> Then you would use the blobStore to create the FileStore that your node
> store requires.
>
>
> -MR
>
> On Fri, Feb 21, 2020 at 2:03 PM jorgeeflorez . <
> jorgeeduardoflo...@gmail.com>
> wrote:
>
> > Hi,
> > I am trying to pick one data store with the purpose of avoiding binary
> > storage in MongoDB blobs collection. I would like to know which should I
> > choose to be used in production.
> > I have explored a bit (1.12 version) and my guess is that
> > DataStoreBlobStore should be used when you want to store files in a local
> > directory (one oak instance only accessing the files) whereas
> > CachingFileDataStore should be used if the folder where you want to store
> > files is located in another host and can be seen from the machine running
> > Oak (several Oak instances can be accessing the files). Is this correct?
> >
>


Re: CachingFileDataStore vs DataStoreBlobStore

2020-02-21 Thread Matt Ryan
Hi,

I think I probably will need a bit more information about your use case to
know how to help you best; can you provide a bit more detail about your
environment and what you are hoping to accomplish?

If your storage is "local" (meaning it appears as a local filesystem to
Oak), I'd probably use OakFileDataStore.  It implements SharedDataStore so
you can share the same location with multiple instances.  For example if
you create a file share on a NAS and then mount that share on multiple
servers - even though the storage is across the network, it is mounted in
the filesystem and appears local.  OakFileDataStore should work well for
this purpose.

The other common use case for a shared storage location is cloud-based
storage, like AWS S3.  In this case use S3DataStore (for AWS S3) or
AzureDataStore (for Microsoft Azure Blob Storage).

Do you have a different use case than one of these?


As for DataStoreBlobStore - DataStoreBlobStore is a wrapper around a class
that implements DataStore to make it look like a BlobStore.  For reasons
I'm not fully aware of (happened before my time - probably historical),
binary object storage in Oak is usually available as an implementation of
the Jackrabbit DataStore interface but Oak interacts with these as
BlobStores.  You will usually set up your repository something like this:
   DataStore ds = new OakFileDataStore(); // or whatever DataStore type you
choose
   ds.init(dataStoreHomeDirectory);
   BlobStore blobStore = new DataStoreBlobStore(ds);
Then you would use the blobStore to create the FileStore that your node
store requires.


-MR

On Fri, Feb 21, 2020 at 2:03 PM jorgeeflorez . 
wrote:

> Hi,
> I am trying to pick one data store with the purpose of avoiding binary
> storage in MongoDB blobs collection. I would like to know which should I
> choose to be used in production.
> I have explored a bit (1.12 version) and my guess is that
> DataStoreBlobStore should be used when you want to store files in a local
> directory (one oak instance only accessing the files) whereas
> CachingFileDataStore should be used if the folder where you want to store
> files is located in another host and can be seen from the machine running
> Oak (several Oak instances can be accessing the files). Is this correct?
>


Re: CachingFileDataStore vs DataStoreBlobStore

2020-02-21 Thread jorgeeflorez .
Ok, I meant DataStoreBlobStore wrapping a FileDataStore and
DataStoreBlobStore wrapping a CachingFileDataStore (I am still confused I
guess)...

El vie., 21 feb. 2020 a las 16:03, jorgeeflorez . (<
jorgeeduardoflo...@gmail.com>) escribió:

> Hi,
> I am trying to pick one data store with the purpose of avoiding binary
> storage in MongoDB blobs collection. I would like to know which should I
> choose to be used in production.
> I have explored a bit (1.12 version) and my guess is that
> DataStoreBlobStore should be used when you want to store files in a local
> directory (one oak instance only accessing the files) whereas
> CachingFileDataStore should be used if the folder where you want to store
> files is located in another host and can be seen from the machine running
> Oak (several Oak instances can be accessing the files). Is this correct?
>