Re: CachingFileDataStore vs DataStoreBlobStore
Hi, Yes you should be using OakFileDataStore. You should be able to just instantiate with this. Thanks Amit On Tue, Feb 25, 2020 at 12:04 PM Marco Piovesana wrote: > Hi Amit, thanks for the clarification. Right now I'm using Oak with the > FileDataStore wrapped into a DataStoreBlobStore. Should I change it to > OakFileDataStore then? If yes, do I have to do an upgrade for that or I > just need to instantiate my storage with the new class? > > Marco. > > On Tue, Feb 25, 2020 at 11:52 AM Amit Jain wrote: > > > Hi, > > > > OakFileDataStore is an extension of the JR2 FileDataStore and implements > > required methods to work in Oak to support DSGC etc. So, in Oak > > OakFileDataStore should only be used. > > > > Thanks > > Amit > > > > On Tue, Feb 25, 2020 at 6:01 AM Marco Piovesana > > wrote: > > > > > Hi guys, > > > what's the difference between the FileDataStore and the > OakFileDataStore? > > > I've seen that the one is an extension of the other and it implements > the > > > SharedDataStore interface, but I did not found other documentation on > it. > > > Is it just the oak implementation of the same storage? Or there are > cases > > > where one should be used instead of the other? > > > > > > Marco. > > > > > > On Mon, Feb 24, 2020 at 6:43 PM Amit Jain wrote: > > > > > > > Hi, > > > > > > > > CachingFileDataStore is only a sort of wrapper to cache files locally > > > (and > > > > upload async) when the actual backend is some sort of NFS and is slow > > for > > > > the parameters you care about. OakFileDataStore is what'll work for > > your > > > > purpose if you don't care about local caching. > > > > > > > > >> it feels like this some info of this thread should be in the > online > > > > documentation > > > > Feel free to create a patch to update the documentation at > > > > https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html with > > what > > > is > > > > missing. > > > > > > > > Thanks > > > > Amit > > > > > > > > On Sun, Feb 23, 2020 at 4:00 AM jorgeeflorez . < > > > > jorgeeduardoflo...@gmail.com> > > > > wrote: > > > > > > > > > Hi Matt, > > > > > > > > > > Just be sure that any Oak instances sharing the same file location > > > belong > > > > > > to the same logical cluster. > > > > > > > > > > > > Sharing the same file location between multiple logical instances > > > > should > > > > > > "work", but certain capabilities like data store GC won't work > well > > > in > > > > > that > > > > > > scenario. > > > > > > > > > > > > That doesn't mean you need a separate file server for each Oak > > > cluster > > > > > > though. One location per cluster should work fine - they could > be > > > > > > different shares on the same server, or even different folders in > > the > > > > > same > > > > > > share. > > > > > > > > > > > > > > > I am not sure if I am understanding you. I will have a different > > > > directory > > > > > for each repository and all Oak instances for the same repository > > will > > > > use > > > > > that directory as File Store. Each instance will have its own > > > clusterId. > > > > > > > > > > One question though - you said one customer has servers in Amazon > (I > > > > assume > > > > > > EC2). Where are they planning to store their binaries - in file > > > > storage > > > > > > mounted by the VM or in S3? They may wish to consider using an > S3 > > > > bucket > > > > > > instead and using S3DataStore - might cost less. > > > > > > > > > > > > > > > > Yes, they have EC2 servers. Initially we had the binaries stored in > > > > > MongoDB, of course that is not good. So the idea is to store them > in > > > the > > > > > OS file system, but I think available space could run out quickly. > I > > > > think > > > > > I once suggested using S3 but I am not sure if they want that. I > will > > > > > mention it again. > > > > > > > > > > TBH I don't see what caching gives you in this scenario. The > caching > > > > > > implementation will maintain a local cache of uploaded and > > downloaded > > > > > > files; the intent would be to improve latency, but caches also > > always > > > > add > > > > > > complexity. With OakFileDataStore the files are already "local" > > > > anyway - > > > > > > even if across a network I don't know how much the cache buys you > > in > > > > > terms > > > > > > of performance. > > > > > > > > > > > > > > > Yes, although it seemed cool when I read and tried it, I think > using > > > > > CachingFileDataStore could make things a bit more difficult. I hope > > > that > > > > > with OakFileDataStore be enough. > > > > > > > > > > Thank you Matt. With your help, I understand this topic a lot more > > (it > > > > > feels like this some info of this thread should be in the online > > > > > documentation). > > > > > > > > > > Best Regards. > > > > > > > > > > Jorge > > > > > > > > > > El vie., 21 feb. 2020 a las 18:57, Matt Ryan ( >) > > > > > escribió: > > > > > > > > > > > Hi Jorge, > > > > > > > > > > > > On Fri, Feb 21, 2020 at 3:40 PM
Re: CachingFileDataStore vs DataStoreBlobStore
Hi Amit, thanks for the clarification. Right now I'm using Oak with the FileDataStore wrapped into a DataStoreBlobStore. Should I change it to OakFileDataStore then? If yes, do I have to do an upgrade for that or I just need to instantiate my storage with the new class? Marco. On Tue, Feb 25, 2020 at 11:52 AM Amit Jain wrote: > Hi, > > OakFileDataStore is an extension of the JR2 FileDataStore and implements > required methods to work in Oak to support DSGC etc. So, in Oak > OakFileDataStore should only be used. > > Thanks > Amit > > On Tue, Feb 25, 2020 at 6:01 AM Marco Piovesana > wrote: > > > Hi guys, > > what's the difference between the FileDataStore and the OakFileDataStore? > > I've seen that the one is an extension of the other and it implements the > > SharedDataStore interface, but I did not found other documentation on it. > > Is it just the oak implementation of the same storage? Or there are cases > > where one should be used instead of the other? > > > > Marco. > > > > On Mon, Feb 24, 2020 at 6:43 PM Amit Jain wrote: > > > > > Hi, > > > > > > CachingFileDataStore is only a sort of wrapper to cache files locally > > (and > > > upload async) when the actual backend is some sort of NFS and is slow > for > > > the parameters you care about. OakFileDataStore is what'll work for > your > > > purpose if you don't care about local caching. > > > > > > >> it feels like this some info of this thread should be in the online > > > documentation > > > Feel free to create a patch to update the documentation at > > > https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html with > what > > is > > > missing. > > > > > > Thanks > > > Amit > > > > > > On Sun, Feb 23, 2020 at 4:00 AM jorgeeflorez . < > > > jorgeeduardoflo...@gmail.com> > > > wrote: > > > > > > > Hi Matt, > > > > > > > > Just be sure that any Oak instances sharing the same file location > > belong > > > > > to the same logical cluster. > > > > > > > > > > Sharing the same file location between multiple logical instances > > > should > > > > > "work", but certain capabilities like data store GC won't work well > > in > > > > that > > > > > scenario. > > > > > > > > > > That doesn't mean you need a separate file server for each Oak > > cluster > > > > > though. One location per cluster should work fine - they could be > > > > > different shares on the same server, or even different folders in > the > > > > same > > > > > share. > > > > > > > > > > > > I am not sure if I am understanding you. I will have a different > > > directory > > > > for each repository and all Oak instances for the same repository > will > > > use > > > > that directory as File Store. Each instance will have its own > > clusterId. > > > > > > > > One question though - you said one customer has servers in Amazon (I > > > assume > > > > > EC2). Where are they planning to store their binaries - in file > > > storage > > > > > mounted by the VM or in S3? They may wish to consider using an S3 > > > bucket > > > > > instead and using S3DataStore - might cost less. > > > > > > > > > > > > > Yes, they have EC2 servers. Initially we had the binaries stored in > > > > MongoDB, of course that is not good. So the idea is to store them in > > the > > > > OS file system, but I think available space could run out quickly. I > > > think > > > > I once suggested using S3 but I am not sure if they want that. I will > > > > mention it again. > > > > > > > > TBH I don't see what caching gives you in this scenario. The caching > > > > > implementation will maintain a local cache of uploaded and > downloaded > > > > > files; the intent would be to improve latency, but caches also > always > > > add > > > > > complexity. With OakFileDataStore the files are already "local" > > > anyway - > > > > > even if across a network I don't know how much the cache buys you > in > > > > terms > > > > > of performance. > > > > > > > > > > > > Yes, although it seemed cool when I read and tried it, I think using > > > > CachingFileDataStore could make things a bit more difficult. I hope > > that > > > > with OakFileDataStore be enough. > > > > > > > > Thank you Matt. With your help, I understand this topic a lot more > (it > > > > feels like this some info of this thread should be in the online > > > > documentation). > > > > > > > > Best Regards. > > > > > > > > Jorge > > > > > > > > El vie., 21 feb. 2020 a las 18:57, Matt Ryan () > > > > escribió: > > > > > > > > > Hi Jorge, > > > > > > > > > > On Fri, Feb 21, 2020 at 3:40 PM jorgeeflorez . < > > > > > jorgeeduardoflo...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi Matt, thanks a lot for your answer. > > > > > > > > > > > > If your storage is "local" (meaning it appears as a local > > filesystem > > > to > > > > > > > Oak), I'd probably use OakFileDataStore. It implements > > > > SharedDataStore > > > > > > so > > > > > > > you can share the same location with multiple instances. For > > > example > > > > > if > > > > > > > you create a
Re: CachingFileDataStore vs DataStoreBlobStore
Hi, OakFileDataStore is an extension of the JR2 FileDataStore and implements required methods to work in Oak to support DSGC etc. So, in Oak OakFileDataStore should only be used. Thanks Amit On Tue, Feb 25, 2020 at 6:01 AM Marco Piovesana wrote: > Hi guys, > what's the difference between the FileDataStore and the OakFileDataStore? > I've seen that the one is an extension of the other and it implements the > SharedDataStore interface, but I did not found other documentation on it. > Is it just the oak implementation of the same storage? Or there are cases > where one should be used instead of the other? > > Marco. > > On Mon, Feb 24, 2020 at 6:43 PM Amit Jain wrote: > > > Hi, > > > > CachingFileDataStore is only a sort of wrapper to cache files locally > (and > > upload async) when the actual backend is some sort of NFS and is slow for > > the parameters you care about. OakFileDataStore is what'll work for your > > purpose if you don't care about local caching. > > > > >> it feels like this some info of this thread should be in the online > > documentation > > Feel free to create a patch to update the documentation at > > https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html with what > is > > missing. > > > > Thanks > > Amit > > > > On Sun, Feb 23, 2020 at 4:00 AM jorgeeflorez . < > > jorgeeduardoflo...@gmail.com> > > wrote: > > > > > Hi Matt, > > > > > > Just be sure that any Oak instances sharing the same file location > belong > > > > to the same logical cluster. > > > > > > > > Sharing the same file location between multiple logical instances > > should > > > > "work", but certain capabilities like data store GC won't work well > in > > > that > > > > scenario. > > > > > > > > That doesn't mean you need a separate file server for each Oak > cluster > > > > though. One location per cluster should work fine - they could be > > > > different shares on the same server, or even different folders in the > > > same > > > > share. > > > > > > > > > I am not sure if I am understanding you. I will have a different > > directory > > > for each repository and all Oak instances for the same repository will > > use > > > that directory as File Store. Each instance will have its own > clusterId. > > > > > > One question though - you said one customer has servers in Amazon (I > > assume > > > > EC2). Where are they planning to store their binaries - in file > > storage > > > > mounted by the VM or in S3? They may wish to consider using an S3 > > bucket > > > > instead and using S3DataStore - might cost less. > > > > > > > > > > Yes, they have EC2 servers. Initially we had the binaries stored in > > > MongoDB, of course that is not good. So the idea is to store them in > the > > > OS file system, but I think available space could run out quickly. I > > think > > > I once suggested using S3 but I am not sure if they want that. I will > > > mention it again. > > > > > > TBH I don't see what caching gives you in this scenario. The caching > > > > implementation will maintain a local cache of uploaded and downloaded > > > > files; the intent would be to improve latency, but caches also always > > add > > > > complexity. With OakFileDataStore the files are already "local" > > anyway - > > > > even if across a network I don't know how much the cache buys you in > > > terms > > > > of performance. > > > > > > > > > Yes, although it seemed cool when I read and tried it, I think using > > > CachingFileDataStore could make things a bit more difficult. I hope > that > > > with OakFileDataStore be enough. > > > > > > Thank you Matt. With your help, I understand this topic a lot more (it > > > feels like this some info of this thread should be in the online > > > documentation). > > > > > > Best Regards. > > > > > > Jorge > > > > > > El vie., 21 feb. 2020 a las 18:57, Matt Ryan () > > > escribió: > > > > > > > Hi Jorge, > > > > > > > > On Fri, Feb 21, 2020 at 3:40 PM jorgeeflorez . < > > > > jorgeeduardoflo...@gmail.com> > > > > wrote: > > > > > > > > > Hi Matt, thanks a lot for your answer. > > > > > > > > > > If your storage is "local" (meaning it appears as a local > filesystem > > to > > > > > > Oak), I'd probably use OakFileDataStore. It implements > > > SharedDataStore > > > > > so > > > > > > you can share the same location with multiple instances. For > > example > > > > if > > > > > > you create a file share on a NAS and then mount that share on > > > multiple > > > > > > servers - even though the storage is across the network, it is > > > mounted > > > > in > > > > > > the filesystem and appears local. OakFileDataStore should work > > well > > > > for > > > > > > this purpose. > > > > > > > > > > > > > > > I think this would be the case: I will have one or more servers, > each > > > one > > > > > with one or more Oak instances (we handle several repositories), > all > > > > > "using" the same file store. One customer has those servers in the > > same > > > > > intranet and another has them in
Re: CachingFileDataStore vs DataStoreBlobStore
Hi guys, what's the difference between the FileDataStore and the OakFileDataStore? I've seen that the one is an extension of the other and it implements the SharedDataStore interface, but I did not found other documentation on it. Is it just the oak implementation of the same storage? Or there are cases where one should be used instead of the other? Marco. On Mon, Feb 24, 2020 at 6:43 PM Amit Jain wrote: > Hi, > > CachingFileDataStore is only a sort of wrapper to cache files locally (and > upload async) when the actual backend is some sort of NFS and is slow for > the parameters you care about. OakFileDataStore is what'll work for your > purpose if you don't care about local caching. > > >> it feels like this some info of this thread should be in the online > documentation > Feel free to create a patch to update the documentation at > https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html with what is > missing. > > Thanks > Amit > > On Sun, Feb 23, 2020 at 4:00 AM jorgeeflorez . < > jorgeeduardoflo...@gmail.com> > wrote: > > > Hi Matt, > > > > Just be sure that any Oak instances sharing the same file location belong > > > to the same logical cluster. > > > > > > Sharing the same file location between multiple logical instances > should > > > "work", but certain capabilities like data store GC won't work well in > > that > > > scenario. > > > > > > That doesn't mean you need a separate file server for each Oak cluster > > > though. One location per cluster should work fine - they could be > > > different shares on the same server, or even different folders in the > > same > > > share. > > > > > > I am not sure if I am understanding you. I will have a different > directory > > for each repository and all Oak instances for the same repository will > use > > that directory as File Store. Each instance will have its own clusterId. > > > > One question though - you said one customer has servers in Amazon (I > assume > > > EC2). Where are they planning to store their binaries - in file > storage > > > mounted by the VM or in S3? They may wish to consider using an S3 > bucket > > > instead and using S3DataStore - might cost less. > > > > > > > Yes, they have EC2 servers. Initially we had the binaries stored in > > MongoDB, of course that is not good. So the idea is to store them in the > > OS file system, but I think available space could run out quickly. I > think > > I once suggested using S3 but I am not sure if they want that. I will > > mention it again. > > > > TBH I don't see what caching gives you in this scenario. The caching > > > implementation will maintain a local cache of uploaded and downloaded > > > files; the intent would be to improve latency, but caches also always > add > > > complexity. With OakFileDataStore the files are already "local" > anyway - > > > even if across a network I don't know how much the cache buys you in > > terms > > > of performance. > > > > > > Yes, although it seemed cool when I read and tried it, I think using > > CachingFileDataStore could make things a bit more difficult. I hope that > > with OakFileDataStore be enough. > > > > Thank you Matt. With your help, I understand this topic a lot more (it > > feels like this some info of this thread should be in the online > > documentation). > > > > Best Regards. > > > > Jorge > > > > El vie., 21 feb. 2020 a las 18:57, Matt Ryan () > > escribió: > > > > > Hi Jorge, > > > > > > On Fri, Feb 21, 2020 at 3:40 PM jorgeeflorez . < > > > jorgeeduardoflo...@gmail.com> > > > wrote: > > > > > > > Hi Matt, thanks a lot for your answer. > > > > > > > > If your storage is "local" (meaning it appears as a local filesystem > to > > > > > Oak), I'd probably use OakFileDataStore. It implements > > SharedDataStore > > > > so > > > > > you can share the same location with multiple instances. For > example > > > if > > > > > you create a file share on a NAS and then mount that share on > > multiple > > > > > servers - even though the storage is across the network, it is > > mounted > > > in > > > > > the filesystem and appears local. OakFileDataStore should work > well > > > for > > > > > this purpose. > > > > > > > > > > > > I think this would be the case: I will have one or more servers, each > > one > > > > with one or more Oak instances (we handle several repositories), all > > > > "using" the same file store. One customer has those servers in the > same > > > > intranet and another has them in Amazon. But in both cases I could > > mount > > > a > > > > folder that would be "visible" to all servers, right? > > > > > > > > > > Just be sure that any Oak instances sharing the same file location > belong > > > to the same logical cluster. > > > > > > Sharing the same file location between multiple logical instances > should > > > "work", but certain capabilities like data store GC won't work well in > > that > > > scenario. > > > > > > That doesn't mean you need a separate file server for each Oak cluster > > > though. One
Re: CachingFileDataStore vs DataStoreBlobStore
Hi, CachingFileDataStore is only a sort of wrapper to cache files locally (and upload async) when the actual backend is some sort of NFS and is slow for the parameters you care about. OakFileDataStore is what'll work for your purpose if you don't care about local caching. >> it feels like this some info of this thread should be in the online documentation Feel free to create a patch to update the documentation at https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html with what is missing. Thanks Amit On Sun, Feb 23, 2020 at 4:00 AM jorgeeflorez . wrote: > Hi Matt, > > Just be sure that any Oak instances sharing the same file location belong > > to the same logical cluster. > > > > Sharing the same file location between multiple logical instances should > > "work", but certain capabilities like data store GC won't work well in > that > > scenario. > > > > That doesn't mean you need a separate file server for each Oak cluster > > though. One location per cluster should work fine - they could be > > different shares on the same server, or even different folders in the > same > > share. > > > I am not sure if I am understanding you. I will have a different directory > for each repository and all Oak instances for the same repository will use > that directory as File Store. Each instance will have its own clusterId. > > One question though - you said one customer has servers in Amazon (I assume > > EC2). Where are they planning to store their binaries - in file storage > > mounted by the VM or in S3? They may wish to consider using an S3 bucket > > instead and using S3DataStore - might cost less. > > > > Yes, they have EC2 servers. Initially we had the binaries stored in > MongoDB, of course that is not good. So the idea is to store them in the > OS file system, but I think available space could run out quickly. I think > I once suggested using S3 but I am not sure if they want that. I will > mention it again. > > TBH I don't see what caching gives you in this scenario. The caching > > implementation will maintain a local cache of uploaded and downloaded > > files; the intent would be to improve latency, but caches also always add > > complexity. With OakFileDataStore the files are already "local" anyway - > > even if across a network I don't know how much the cache buys you in > terms > > of performance. > > > Yes, although it seemed cool when I read and tried it, I think using > CachingFileDataStore could make things a bit more difficult. I hope that > with OakFileDataStore be enough. > > Thank you Matt. With your help, I understand this topic a lot more (it > feels like this some info of this thread should be in the online > documentation). > > Best Regards. > > Jorge > > El vie., 21 feb. 2020 a las 18:57, Matt Ryan () > escribió: > > > Hi Jorge, > > > > On Fri, Feb 21, 2020 at 3:40 PM jorgeeflorez . < > > jorgeeduardoflo...@gmail.com> > > wrote: > > > > > Hi Matt, thanks a lot for your answer. > > > > > > If your storage is "local" (meaning it appears as a local filesystem to > > > > Oak), I'd probably use OakFileDataStore. It implements > SharedDataStore > > > so > > > > you can share the same location with multiple instances. For example > > if > > > > you create a file share on a NAS and then mount that share on > multiple > > > > servers - even though the storage is across the network, it is > mounted > > in > > > > the filesystem and appears local. OakFileDataStore should work well > > for > > > > this purpose. > > > > > > > > > I think this would be the case: I will have one or more servers, each > one > > > with one or more Oak instances (we handle several repositories), all > > > "using" the same file store. One customer has those servers in the same > > > intranet and another has them in Amazon. But in both cases I could > mount > > a > > > folder that would be "visible" to all servers, right? > > > > > > > Just be sure that any Oak instances sharing the same file location belong > > to the same logical cluster. > > > > Sharing the same file location between multiple logical instances should > > "work", but certain capabilities like data store GC won't work well in > that > > scenario. > > > > That doesn't mean you need a separate file server for each Oak cluster > > though. One location per cluster should work fine - they could be > > different shares on the same server, or even different folders in the > same > > share. > > > > One question though - you said one customer has servers in Amazon (I > assume > > EC2). Where are they planning to store their binaries - in file storage > > mounted by the VM or in S3? They may wish to consider using an S3 bucket > > instead and using S3DataStore - might cost less. > > > > > > > > > > > > Do you think it would be best to use OakFileDataStore over, for example > > > CachingFileDataStore? to keep things "simple"? > > > > > > > TBH I don't see what caching gives you in this scenario. The caching > > implementation will maintain a
Re: CachingFileDataStore vs DataStoreBlobStore
Hi Matt, Just be sure that any Oak instances sharing the same file location belong > to the same logical cluster. > > Sharing the same file location between multiple logical instances should > "work", but certain capabilities like data store GC won't work well in that > scenario. > > That doesn't mean you need a separate file server for each Oak cluster > though. One location per cluster should work fine - they could be > different shares on the same server, or even different folders in the same > share. I am not sure if I am understanding you. I will have a different directory for each repository and all Oak instances for the same repository will use that directory as File Store. Each instance will have its own clusterId. One question though - you said one customer has servers in Amazon (I assume > EC2). Where are they planning to store their binaries - in file storage > mounted by the VM or in S3? They may wish to consider using an S3 bucket > instead and using S3DataStore - might cost less. > Yes, they have EC2 servers. Initially we had the binaries stored in MongoDB, of course that is not good. So the idea is to store them in the OS file system, but I think available space could run out quickly. I think I once suggested using S3 but I am not sure if they want that. I will mention it again. TBH I don't see what caching gives you in this scenario. The caching > implementation will maintain a local cache of uploaded and downloaded > files; the intent would be to improve latency, but caches also always add > complexity. With OakFileDataStore the files are already "local" anyway - > even if across a network I don't know how much the cache buys you in terms > of performance. Yes, although it seemed cool when I read and tried it, I think using CachingFileDataStore could make things a bit more difficult. I hope that with OakFileDataStore be enough. Thank you Matt. With your help, I understand this topic a lot more (it feels like this some info of this thread should be in the online documentation). Best Regards. Jorge El vie., 21 feb. 2020 a las 18:57, Matt Ryan () escribió: > Hi Jorge, > > On Fri, Feb 21, 2020 at 3:40 PM jorgeeflorez . < > jorgeeduardoflo...@gmail.com> > wrote: > > > Hi Matt, thanks a lot for your answer. > > > > If your storage is "local" (meaning it appears as a local filesystem to > > > Oak), I'd probably use OakFileDataStore. It implements SharedDataStore > > so > > > you can share the same location with multiple instances. For example > if > > > you create a file share on a NAS and then mount that share on multiple > > > servers - even though the storage is across the network, it is mounted > in > > > the filesystem and appears local. OakFileDataStore should work well > for > > > this purpose. > > > > > > I think this would be the case: I will have one or more servers, each one > > with one or more Oak instances (we handle several repositories), all > > "using" the same file store. One customer has those servers in the same > > intranet and another has them in Amazon. But in both cases I could mount > a > > folder that would be "visible" to all servers, right? > > > > Just be sure that any Oak instances sharing the same file location belong > to the same logical cluster. > > Sharing the same file location between multiple logical instances should > "work", but certain capabilities like data store GC won't work well in that > scenario. > > That doesn't mean you need a separate file server for each Oak cluster > though. One location per cluster should work fine - they could be > different shares on the same server, or even different folders in the same > share. > > One question though - you said one customer has servers in Amazon (I assume > EC2). Where are they planning to store their binaries - in file storage > mounted by the VM or in S3? They may wish to consider using an S3 bucket > instead and using S3DataStore - might cost less. > > > > > > > Do you think it would be best to use OakFileDataStore over, for example > > CachingFileDataStore? to keep things "simple"? > > > > TBH I don't see what caching gives you in this scenario. The caching > implementation will maintain a local cache of uploaded and downloaded > files; the intent would be to improve latency, but caches also always add > complexity. With OakFileDataStore the files are already "local" anyway - > even if across a network I don't know how much the cache buys you in terms > of performance. > > > > > > > As for DataStoreBlobStore - DataStoreBlobStore is a wrapper around a > class > > > that implements DataStore to make it look like a BlobStore. > > > > > > I have been using something like this to setup my repository, I do not > > know if there is another way... > > > > FileDataStore fds = new FileDataStore(); > > File dir = ...; > > fds.init(dir.getAbsolutePath()); > > DataStoreBlobStore dsbs = new DataStoreBlobStore(fds); > > DocumentNodeStore docStore = new MongoDocumentNodeStoreBuilder(). > >
Re: CachingFileDataStore vs DataStoreBlobStore
Hi Jorge, On Fri, Feb 21, 2020 at 3:40 PM jorgeeflorez . wrote: > Hi Matt, thanks a lot for your answer. > > If your storage is "local" (meaning it appears as a local filesystem to > > Oak), I'd probably use OakFileDataStore. It implements SharedDataStore > so > > you can share the same location with multiple instances. For example if > > you create a file share on a NAS and then mount that share on multiple > > servers - even though the storage is across the network, it is mounted in > > the filesystem and appears local. OakFileDataStore should work well for > > this purpose. > > > I think this would be the case: I will have one or more servers, each one > with one or more Oak instances (we handle several repositories), all > "using" the same file store. One customer has those servers in the same > intranet and another has them in Amazon. But in both cases I could mount a > folder that would be "visible" to all servers, right? > Just be sure that any Oak instances sharing the same file location belong to the same logical cluster. Sharing the same file location between multiple logical instances should "work", but certain capabilities like data store GC won't work well in that scenario. That doesn't mean you need a separate file server for each Oak cluster though. One location per cluster should work fine - they could be different shares on the same server, or even different folders in the same share. One question though - you said one customer has servers in Amazon (I assume EC2). Where are they planning to store their binaries - in file storage mounted by the VM or in S3? They may wish to consider using an S3 bucket instead and using S3DataStore - might cost less. > > Do you think it would be best to use OakFileDataStore over, for example > CachingFileDataStore? to keep things "simple"? > TBH I don't see what caching gives you in this scenario. The caching implementation will maintain a local cache of uploaded and downloaded files; the intent would be to improve latency, but caches also always add complexity. With OakFileDataStore the files are already "local" anyway - even if across a network I don't know how much the cache buys you in terms of performance. > > As for DataStoreBlobStore - DataStoreBlobStore is a wrapper around a class > > that implements DataStore to make it look like a BlobStore. > > > > I have been using something like this to setup my repository, I do not > know if there is another way... > > FileDataStore fds = new FileDataStore(); > File dir = ...; > fds.init(dir.getAbsolutePath()); > DataStoreBlobStore dsbs = new DataStoreBlobStore(fds); > DocumentNodeStore docStore = new MongoDocumentNodeStoreBuilder(). > setMongoDB("mongodb://user:password@" + host + ":" + > port, "repo1", 16). > setClusterId(123). > setAsyncDelay(10). > setBlobStore(dsbs). > build(); > > That looks like the right idea - other than I'd use OakFileDataStore instead of FileDataStore. -MR
Re: CachingFileDataStore vs DataStoreBlobStore
Hi Matt, thanks a lot for your answer. If your storage is "local" (meaning it appears as a local filesystem to > Oak), I'd probably use OakFileDataStore. It implements SharedDataStore so > you can share the same location with multiple instances. For example if > you create a file share on a NAS and then mount that share on multiple > servers - even though the storage is across the network, it is mounted in > the filesystem and appears local. OakFileDataStore should work well for > this purpose. I think this would be the case: I will have one or more servers, each one with one or more Oak instances (we handle several repositories), all "using" the same file store. One customer has those servers in the same intranet and another has them in Amazon. But in both cases I could mount a folder that would be "visible" to all servers, right? Do you think it would be best to use OakFileDataStore over, for example CachingFileDataStore? to keep things "simple"? As for DataStoreBlobStore - DataStoreBlobStore is a wrapper around a class > that implements DataStore to make it look like a BlobStore. > > I have been using something like this to setup my repository, I do not know if there is another way... FileDataStore fds = new FileDataStore(); File dir = ...; fds.init(dir.getAbsolutePath()); DataStoreBlobStore dsbs = new DataStoreBlobStore(fds); DocumentNodeStore docStore = new MongoDocumentNodeStoreBuilder(). setMongoDB("mongodb://user:password@" + host + ":" + port, "repo1", 16). setClusterId(123). setAsyncDelay(10). setBlobStore(dsbs). build(); Jorge El vie., 21 feb. 2020 a las 16:36, Matt Ryan () escribió: > Hi, > > I think I probably will need a bit more information about your use case to > know how to help you best; can you provide a bit more detail about your > environment and what you are hoping to accomplish? > > If your storage is "local" (meaning it appears as a local filesystem to > Oak), I'd probably use OakFileDataStore. It implements SharedDataStore so > you can share the same location with multiple instances. For example if > you create a file share on a NAS and then mount that share on multiple > servers - even though the storage is across the network, it is mounted in > the filesystem and appears local. OakFileDataStore should work well for > this purpose. > > The other common use case for a shared storage location is cloud-based > storage, like AWS S3. In this case use S3DataStore (for AWS S3) or > AzureDataStore (for Microsoft Azure Blob Storage). > > Do you have a different use case than one of these? > > > As for DataStoreBlobStore - DataStoreBlobStore is a wrapper around a class > that implements DataStore to make it look like a BlobStore. For reasons > I'm not fully aware of (happened before my time - probably historical), > binary object storage in Oak is usually available as an implementation of > the Jackrabbit DataStore interface but Oak interacts with these as > BlobStores. You will usually set up your repository something like this: >DataStore ds = new OakFileDataStore(); // or whatever DataStore type you > choose >ds.init(dataStoreHomeDirectory); >BlobStore blobStore = new DataStoreBlobStore(ds); > Then you would use the blobStore to create the FileStore that your node > store requires. > > > -MR > > On Fri, Feb 21, 2020 at 2:03 PM jorgeeflorez . < > jorgeeduardoflo...@gmail.com> > wrote: > > > Hi, > > I am trying to pick one data store with the purpose of avoiding binary > > storage in MongoDB blobs collection. I would like to know which should I > > choose to be used in production. > > I have explored a bit (1.12 version) and my guess is that > > DataStoreBlobStore should be used when you want to store files in a local > > directory (one oak instance only accessing the files) whereas > > CachingFileDataStore should be used if the folder where you want to store > > files is located in another host and can be seen from the machine running > > Oak (several Oak instances can be accessing the files). Is this correct? > > >
Re: CachingFileDataStore vs DataStoreBlobStore
Hi, I think I probably will need a bit more information about your use case to know how to help you best; can you provide a bit more detail about your environment and what you are hoping to accomplish? If your storage is "local" (meaning it appears as a local filesystem to Oak), I'd probably use OakFileDataStore. It implements SharedDataStore so you can share the same location with multiple instances. For example if you create a file share on a NAS and then mount that share on multiple servers - even though the storage is across the network, it is mounted in the filesystem and appears local. OakFileDataStore should work well for this purpose. The other common use case for a shared storage location is cloud-based storage, like AWS S3. In this case use S3DataStore (for AWS S3) or AzureDataStore (for Microsoft Azure Blob Storage). Do you have a different use case than one of these? As for DataStoreBlobStore - DataStoreBlobStore is a wrapper around a class that implements DataStore to make it look like a BlobStore. For reasons I'm not fully aware of (happened before my time - probably historical), binary object storage in Oak is usually available as an implementation of the Jackrabbit DataStore interface but Oak interacts with these as BlobStores. You will usually set up your repository something like this: DataStore ds = new OakFileDataStore(); // or whatever DataStore type you choose ds.init(dataStoreHomeDirectory); BlobStore blobStore = new DataStoreBlobStore(ds); Then you would use the blobStore to create the FileStore that your node store requires. -MR On Fri, Feb 21, 2020 at 2:03 PM jorgeeflorez . wrote: > Hi, > I am trying to pick one data store with the purpose of avoiding binary > storage in MongoDB blobs collection. I would like to know which should I > choose to be used in production. > I have explored a bit (1.12 version) and my guess is that > DataStoreBlobStore should be used when you want to store files in a local > directory (one oak instance only accessing the files) whereas > CachingFileDataStore should be used if the folder where you want to store > files is located in another host and can be seen from the machine running > Oak (several Oak instances can be accessing the files). Is this correct? >
Re: CachingFileDataStore vs DataStoreBlobStore
Ok, I meant DataStoreBlobStore wrapping a FileDataStore and DataStoreBlobStore wrapping a CachingFileDataStore (I am still confused I guess)... El vie., 21 feb. 2020 a las 16:03, jorgeeflorez . (< jorgeeduardoflo...@gmail.com>) escribió: > Hi, > I am trying to pick one data store with the purpose of avoiding binary > storage in MongoDB blobs collection. I would like to know which should I > choose to be used in production. > I have explored a bit (1.12 version) and my guess is that > DataStoreBlobStore should be used when you want to store files in a local > directory (one oak instance only accessing the files) whereas > CachingFileDataStore should be used if the folder where you want to store > files is located in another host and can be seen from the machine running > Oak (several Oak instances can be accessing the files). Is this correct? >