Re: How does oak cluster work
Hi Team , Im still very interested in understand some of the design chooses oak core team had taken and why . For the long lived snapshots what is use case for this and also I like to understand how indexes are sync between nodes and the role of a oak leader and how the leader node election occurs. thank Emily On Thu, Dec 20, 2018 at 3:02 PM ems eril wrote: > Hi Marcel , thanks for the information . I would love to understand the > use cases for having long lived snapshots in oak . Would you be able for > provide specific examples or functions within oak that needs this > capability ? > > On Wed, Dec 19, 2018 at 12:43 AM Marcel Reutegger > wrote: > >> Hi, >> >> On 18.12.18, 01:55, "ems eril" wrote: >> > 1) Is this a blocking call ? And any plans for callback or java future >> > support? >> >> Yes, Clusterable.isVisible() is a blocking call and you can give it a >> timeout. >> There are no plans right now to add an async variant of this feature. >> >> > 2) Is there any JCR level API we can use as its currently very low >> level ? >> >> No, there is no JCR/Jackrabbit API equivalent for this feature. >> >> > If not is Sling have any plans to use this ? >> >> You will have to ask this on the Sling list. >> >> > 3) Any reason why documentstore needs to implement revision >> snapshotting ? >> > Why can we leverage existing documentstore database capabilities such as >> > mongo https://docs.mongodb.com/manual/core/wiredtiger/ as most support >> MVCC >> >> In Oak we have the requirement to keep a snapshot of the repository for a >> longer >> period of time and not just for concurrency control. E.g. you can create >> a checkpoint >> with a lifetime of several days or even months [0]. >> >> Regards >> Marcel >> >> [0] >> https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/NodeStore.html#checkpoint-long-java.util.Map- >> >>
Re: How does oak cluster work
Hi Marcel , thanks for the information . I would love to understand the use cases for having long lived snapshots in oak . Would you be able for provide specific examples or functions within oak that needs this capability ? On Wed, Dec 19, 2018 at 12:43 AM Marcel Reutegger wrote: > Hi, > > On 18.12.18, 01:55, "ems eril" wrote: > > 1) Is this a blocking call ? And any plans for callback or java future > > support? > > Yes, Clusterable.isVisible() is a blocking call and you can give it a > timeout. > There are no plans right now to add an async variant of this feature. > > > 2) Is there any JCR level API we can use as its currently very low level > ? > > No, there is no JCR/Jackrabbit API equivalent for this feature. > > > If not is Sling have any plans to use this ? > > You will have to ask this on the Sling list. > > > 3) Any reason why documentstore needs to implement revision snapshotting > ? > > Why can we leverage existing documentstore database capabilities such as > > mongo https://docs.mongodb.com/manual/core/wiredtiger/ as most support > MVCC > > In Oak we have the requirement to keep a snapshot of the repository for a > longer > period of time and not just for concurrency control. E.g. you can create a > checkpoint > with a lifetime of several days or even months [0]. > > Regards > Marcel > > [0] > https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/NodeStore.html#checkpoint-long-java.util.Map- > >
Re: How does oak cluster work
Thank Marcel this is very helpful . Couple of questions I have with this interface 1) Is this a blocking call ? And any plans for callback or java future support? 2) Is there any JCR level API we can use as its currently very low level ? If not is Sling have any plans to use this ? 3) Any reason why documentstore needs to implement revision snapshotting ? Why can we leverage existing documentstore database capabilities such as mongo https://docs.mongodb.com/manual/core/wiredtiger/ as most support MVCC . Thanks Emily On Sun, Dec 16, 2018 at 11:58 PM Marcel Reutegger wrote: > Hi, > > There are different ways to approach this in Oak. > > Your application can register an event listener and gets notified about > changes when they are visible on the local cluster node. > > The application can store a visibility token with the job data you have in > Kafka. The visibility token concept is described on the Clusterable [0] > interface, which is an extension to the NodeStore implemented by the > DocumentNodeStore. On the processing cluster node the visibility token is > then used to suspend the job until the changes are visible. > > Regards > Marcel > > [0] > https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/Clusterable.html > > > On 15.12.18, 02:23, "ems eril" wrote: > > Hi Matt , > > Yes your correct, the job is triggered by consumer listening to kafka > queue . But to you earlier statement that this is not a Oak issue I > have to > disagree . In Mongo you can > control write concern and make replication synchronize but we cannot do > something similar in Oak . > > Thanks > > On Fri, Dec 14, 2018 at 3:25 PM Matt Ryan wrote: > > > Hi, > > > > I believe your concern is: Content could be uploaded to the cluster > via > > one Oak instance, and your job to process the content runs in a > different > > Oak instance, and that there is a possibility that the job to > process the > > content reads from a MongoDB node that has stale data, so the > content is > > not available yet. > > > > If I've understood your concern correctly, you are correct that this > is > > something you have to worry about, that there is a possibility that > when > > the job runs it gets stale data because where it reads from has not > been > > updated yet. However, that's not something being caused by Oak; > this would > > be something you'd have to deal with whether Oak was there or not, no > > matter what type of backing database cluster was being used. > > > > Maybe I'm still missing something in your question. How are you > planning > > to trigger your job? > > > > > > > > On Fri, Dec 14, 2018 at 1:01 PM ems eril wrote: > > > > > Hi Matt , > > > > > >I was looking for more details on the inner workings . I came > across > > > this https://markmail.org/message/jbkrsmz3krllqghr where it > mentioned > > that > > > changes in the cluster would eventually appear across other nodes > and > > this > > > is not a mongo specific issue but something oak has introduced . I > can > > set > > > the write concern to majority in mongo but if oak has its own > eventually > > > consistency model this can cause stale reads from other nodes > which would > > > be a problem with the distributed job Im trying to create. > > > > > > Thanks > > > > > > On Fri, Dec 14, 2018 at 8:02 AM Matt Ryan > wrote: > > > > > > > Hi Emily, > > > > > > > > Content is stored in Oak in two different configurable storage > > services. > > > > This is a bit of an oversimplification, but basically the > structure of > > > > content repository - the content tree, nodes, properties, etc. - > is > > > stored > > > > in a Node Store [0] and the binary content is stored in a Blob > Store > > [1] > > > > (you'll also sometimes see the term "data store"). Oak manages > all of > > > this > > > > transparently to external clients. > > > > > > > > Oak clustering is therefore achieved by configuring Oak > instances to > > use > > > > clusterable storage services underneath [2]. For the node > store, an > > > > implementation of a Docume
Re: How does oak cluster work
Hi Matt , Yes your correct, the job is triggered by consumer listening to kafka queue . But to you earlier statement that this is not a Oak issue I have to disagree . In Mongo you can control write concern and make replication synchronize but we cannot do something similar in Oak . Thanks On Fri, Dec 14, 2018 at 3:25 PM Matt Ryan wrote: > Hi, > > I believe your concern is: Content could be uploaded to the cluster via > one Oak instance, and your job to process the content runs in a different > Oak instance, and that there is a possibility that the job to process the > content reads from a MongoDB node that has stale data, so the content is > not available yet. > > If I've understood your concern correctly, you are correct that this is > something you have to worry about, that there is a possibility that when > the job runs it gets stale data because where it reads from has not been > updated yet. However, that's not something being caused by Oak; this would > be something you'd have to deal with whether Oak was there or not, no > matter what type of backing database cluster was being used. > > Maybe I'm still missing something in your question. How are you planning > to trigger your job? > > > > On Fri, Dec 14, 2018 at 1:01 PM ems eril wrote: > > > Hi Matt , > > > >I was looking for more details on the inner workings . I came across > > this https://markmail.org/message/jbkrsmz3krllqghr where it mentioned > that > > changes in the cluster would eventually appear across other nodes and > this > > is not a mongo specific issue but something oak has introduced . I can > set > > the write concern to majority in mongo but if oak has its own eventually > > consistency model this can cause stale reads from other nodes which would > > be a problem with the distributed job Im trying to create. > > > > Thanks > > > > On Fri, Dec 14, 2018 at 8:02 AM Matt Ryan wrote: > > > > > Hi Emily, > > > > > > Content is stored in Oak in two different configurable storage > services. > > > This is a bit of an oversimplification, but basically the structure of > > > content repository - the content tree, nodes, properties, etc. - is > > stored > > > in a Node Store [0] and the binary content is stored in a Blob Store > [1] > > > (you'll also sometimes see the term "data store"). Oak manages all of > > this > > > transparently to external clients. > > > > > > Oak clustering is therefore achieved by configuring Oak instances to > use > > > clusterable storage services underneath [2]. For the node store, an > > > implementation of a DocumentNodeStore [3] is needed; one such > > > implementation uses MongoDB [4]. For the blob store, an implementation > > of > > > a SharedDataStore is needed. For example, both the SharedS3DataStore > and > > > AzureDataStore implementations can be used as a data store for an Oak > > > cluster. > > > > > > So, assume you were using MongoDB and S3. Setting up an Oak cluster > then > > > merely means that you have more than one Oak instance, each of which is > > > configured to use the MongoDB cluster as the node store, and S3 as the > > data > > > store. > > > > > > > > > [0] - > > > > > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/overview.md > > > [1] - > > > > > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/plugins/blobstore.md > > > [2] - > > > > > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/clustering.md > > > [3] - > > > > > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md > > > [4] - > > > > > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/document/mongo-document-store.md > > > > > > > > > Does that help? > > > > > > > > > -MR > > > > > > On Thu, Dec 13, 2018 at 5:52 PM ems eril wrote: > > > > > > > Hi Team , > > > > > > > >Im really interested in understanding how oak cluster works and > how > > do > > > > cluster nodes sync up . These are some of the questions I have > > > > > > > > 1) How does the nodes sync > > > > 2) What is the mongo role > > > > 3) How does indexes in cluster work and sync up > > > > 4) What is the distributed model master/slave multi master > > > > 5) What is co-ordinated by the master node > > > > 6) How is master node elected > > > > > > > >One use case I have is to be able to leverage a oak cluster to be > > able > > > > to upload images/videos and have a consumer on one of the nodes > process > > > it > > > > in a distributed way . I like to try my best to avoid unnecessary > read > > > > checks if possible . > > > > > > > > Thanks > > > > > > > > Emily > > > > > > > > > >
Re: How does oak cluster work
Hi Matt , I was looking for more details on the inner workings . I came across this https://markmail.org/message/jbkrsmz3krllqghr where it mentioned that changes in the cluster would eventually appear across other nodes and this is not a mongo specific issue but something oak has introduced . I can set the write concern to majority in mongo but if oak has its own eventually consistency model this can cause stale reads from other nodes which would be a problem with the distributed job Im trying to create. Thanks On Fri, Dec 14, 2018 at 8:02 AM Matt Ryan wrote: > Hi Emily, > > Content is stored in Oak in two different configurable storage services. > This is a bit of an oversimplification, but basically the structure of > content repository - the content tree, nodes, properties, etc. - is stored > in a Node Store [0] and the binary content is stored in a Blob Store [1] > (you'll also sometimes see the term "data store"). Oak manages all of this > transparently to external clients. > > Oak clustering is therefore achieved by configuring Oak instances to use > clusterable storage services underneath [2]. For the node store, an > implementation of a DocumentNodeStore [3] is needed; one such > implementation uses MongoDB [4]. For the blob store, an implementation of > a SharedDataStore is needed. For example, both the SharedS3DataStore and > AzureDataStore implementations can be used as a data store for an Oak > cluster. > > So, assume you were using MongoDB and S3. Setting up an Oak cluster then > merely means that you have more than one Oak instance, each of which is > configured to use the MongoDB cluster as the node store, and S3 as the data > store. > > > [0] - > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/overview.md > [1] - > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/plugins/blobstore.md > [2] - > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/clustering.md > [3] - > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md > [4] - > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/document/mongo-document-store.md > > > Does that help? > > > -MR > > On Thu, Dec 13, 2018 at 5:52 PM ems eril wrote: > > > Hi Team , > > > >Im really interested in understanding how oak cluster works and how do > > cluster nodes sync up . These are some of the questions I have > > > > 1) How does the nodes sync > > 2) What is the mongo role > > 3) How does indexes in cluster work and sync up > > 4) What is the distributed model master/slave multi master > > 5) What is co-ordinated by the master node > > 6) How is master node elected > > > >One use case I have is to be able to leverage a oak cluster to be able > > to upload images/videos and have a consumer on one of the nodes process > it > > in a distributed way . I like to try my best to avoid unnecessary read > > checks if possible . > > > > Thanks > > > > Emily > > >