Thank Marcel this is very helpful . Couple of questions I have with this interface
1) Is this a blocking call ? And any plans for callback or java future support? 2) Is there any JCR level API we can use as its currently very low level ? If not is Sling have any plans to use this ? 3) Any reason why documentstore needs to implement revision snapshotting ? Why can we leverage existing documentstore database capabilities such as mongo https://docs.mongodb.com/manual/core/wiredtiger/ as most support MVCC . Thanks Emily On Sun, Dec 16, 2018 at 11:58 PM Marcel Reutegger <mreut...@adobe.com.invalid> wrote: > Hi, > > There are different ways to approach this in Oak. > > Your application can register an event listener and gets notified about > changes when they are visible on the local cluster node. > > The application can store a visibility token with the job data you have in > Kafka. The visibility token concept is described on the Clusterable [0] > interface, which is an extension to the NodeStore implemented by the > DocumentNodeStore. On the processing cluster node the visibility token is > then used to suspend the job until the changes are visible. > > Regards > Marcel > > [0] > https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/Clusterable.html > > > On 15.12.18, 02:23, "ems eril" <emsro...@gmail.com> wrote: > > Hi Matt , > > Yes your correct, the job is triggered by consumer listening to kafka > queue . But to you earlier statement that this is not a Oak issue I > have to > disagree . In Mongo you can > control write concern and make replication synchronize but we cannot do > something similar in Oak . > > Thanks > > On Fri, Dec 14, 2018 at 3:25 PM Matt Ryan <mattr...@apache.org> wrote: > > > Hi, > > > > I believe your concern is: Content could be uploaded to the cluster > via > > one Oak instance, and your job to process the content runs in a > different > > Oak instance, and that there is a possibility that the job to > process the > > content reads from a MongoDB node that has stale data, so the > content is > > not available yet. > > > > If I've understood your concern correctly, you are correct that this > is > > something you have to worry about, that there is a possibility that > when > > the job runs it gets stale data because where it reads from has not > been > > updated yet. However, that's not something being caused by Oak; > this would > > be something you'd have to deal with whether Oak was there or not, no > > matter what type of backing database cluster was being used. > > > > Maybe I'm still missing something in your question. How are you > planning > > to trigger your job? > > > > > > > > On Fri, Dec 14, 2018 at 1:01 PM ems eril <emsro...@gmail.com> wrote: > > > > > Hi Matt , > > > > > > I was looking for more details on the inner workings . I came > across > > > this https://markmail.org/message/jbkrsmz3krllqghr where it > mentioned > > that > > > changes in the cluster would eventually appear across other nodes > and > > this > > > is not a mongo specific issue but something oak has introduced . I > can > > set > > > the write concern to majority in mongo but if oak has its own > eventually > > > consistency model this can cause stale reads from other nodes > which would > > > be a problem with the distributed job Im trying to create. > > > > > > Thanks > > > > > > On Fri, Dec 14, 2018 at 8:02 AM Matt Ryan <mattr...@apache.org> > wrote: > > > > > > > Hi Emily, > > > > > > > > Content is stored in Oak in two different configurable storage > > services. > > > > This is a bit of an oversimplification, but basically the > structure of > > > > content repository - the content tree, nodes, properties, etc. - > is > > > stored > > > > in a Node Store [0] and the binary content is stored in a Blob > Store > > [1] > > > > (you'll also sometimes see the term "data store"). Oak manages > all of > > > this > > > > transparently to external clients. > > > > > > > > Oak clustering is therefore achieved by configuring Oak > instances to > > use > > > > clusterable storage services underneath [2]. For the node > store, an > > > > implementation of a DocumentNodeStore [3] is needed; one such > > > > implementation uses MongoDB [4]. For the blob store, an > implementation > > > of > > > > a SharedDataStore is needed. For example, both the > SharedS3DataStore > > and > > > > AzureDataStore implementations can be used as a data store for > an Oak > > > > cluster. > > > > > > > > So, assume you were using MongoDB and S3. Setting up an Oak > cluster > > then > > > > merely means that you have more than one Oak instance, each of > which is > > > > configured to use the MongoDB cluster as the node store, and S3 > as the > > > data > > > > store. > > > > > > > > > > > > [0] - > > > > > > > > > > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/overview.md > > > > [1] - > > > > > > > > > > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/plugins/blobstore.md > > > > [2] - > > > > > > > > > > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/clustering.md > > > > [3] - > > > > > > > > > > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md > > > > [4] - > > > > > > > > > > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/document/mongo-document-store.md > > > > > > > > > > > > Does that help? > > > > > > > > > > > > -MR > > > > > > > > On Thu, Dec 13, 2018 at 5:52 PM ems eril <emsro...@gmail.com> > wrote: > > > > > > > > > Hi Team , > > > > > > > > > > Im really interested in understanding how oak cluster works > and > > how > > > do > > > > > cluster nodes sync up . These are some of the questions I have > > > > > > > > > > 1) How does the nodes sync > > > > > 2) What is the mongo role > > > > > 3) How does indexes in cluster work and sync up > > > > > 4) What is the distributed model master/slave multi master > > > > > 5) What is co-ordinated by the master node > > > > > 6) How is master node elected > > > > > > > > > > One use case I have is to be able to leverage a oak cluster > to be > > > able > > > > > to upload images/videos and have a consumer on one of the nodes > > process > > > > it > > > > > in a distributed way . I like to try my best to avoid > unnecessary > > read > > > > > checks if possible . > > > > > > > > > > Thanks > > > > > > > > > > Emily > > > > > > > > > > > > > > > > >