Hi Matt , Yes your correct, the job is triggered by consumer listening to kafka queue . But to you earlier statement that this is not a Oak issue I have to disagree . In Mongo you can control write concern and make replication synchronize but we cannot do something similar in Oak .
Thanks On Fri, Dec 14, 2018 at 3:25 PM Matt Ryan <mattr...@apache.org> wrote: > Hi, > > I believe your concern is: Content could be uploaded to the cluster via > one Oak instance, and your job to process the content runs in a different > Oak instance, and that there is a possibility that the job to process the > content reads from a MongoDB node that has stale data, so the content is > not available yet. > > If I've understood your concern correctly, you are correct that this is > something you have to worry about, that there is a possibility that when > the job runs it gets stale data because where it reads from has not been > updated yet. However, that's not something being caused by Oak; this would > be something you'd have to deal with whether Oak was there or not, no > matter what type of backing database cluster was being used. > > Maybe I'm still missing something in your question. How are you planning > to trigger your job? > > > > On Fri, Dec 14, 2018 at 1:01 PM ems eril <emsro...@gmail.com> wrote: > > > Hi Matt , > > > > I was looking for more details on the inner workings . I came across > > this https://markmail.org/message/jbkrsmz3krllqghr where it mentioned > that > > changes in the cluster would eventually appear across other nodes and > this > > is not a mongo specific issue but something oak has introduced . I can > set > > the write concern to majority in mongo but if oak has its own eventually > > consistency model this can cause stale reads from other nodes which would > > be a problem with the distributed job Im trying to create. > > > > Thanks > > > > On Fri, Dec 14, 2018 at 8:02 AM Matt Ryan <mattr...@apache.org> wrote: > > > > > Hi Emily, > > > > > > Content is stored in Oak in two different configurable storage > services. > > > This is a bit of an oversimplification, but basically the structure of > > > content repository - the content tree, nodes, properties, etc. - is > > stored > > > in a Node Store [0] and the binary content is stored in a Blob Store > [1] > > > (you'll also sometimes see the term "data store"). Oak manages all of > > this > > > transparently to external clients. > > > > > > Oak clustering is therefore achieved by configuring Oak instances to > use > > > clusterable storage services underneath [2]. For the node store, an > > > implementation of a DocumentNodeStore [3] is needed; one such > > > implementation uses MongoDB [4]. For the blob store, an implementation > > of > > > a SharedDataStore is needed. For example, both the SharedS3DataStore > and > > > AzureDataStore implementations can be used as a data store for an Oak > > > cluster. > > > > > > So, assume you were using MongoDB and S3. Setting up an Oak cluster > then > > > merely means that you have more than one Oak instance, each of which is > > > configured to use the MongoDB cluster as the node store, and S3 as the > > data > > > store. > > > > > > > > > [0] - > > > > > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/overview.md > > > [1] - > > > > > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/plugins/blobstore.md > > > [2] - > > > > > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/clustering.md > > > [3] - > > > > > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md > > > [4] - > > > > > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/document/mongo-document-store.md > > > > > > > > > Does that help? > > > > > > > > > -MR > > > > > > On Thu, Dec 13, 2018 at 5:52 PM ems eril <emsro...@gmail.com> wrote: > > > > > > > Hi Team , > > > > > > > > Im really interested in understanding how oak cluster works and > how > > do > > > > cluster nodes sync up . These are some of the questions I have > > > > > > > > 1) How does the nodes sync > > > > 2) What is the mongo role > > > > 3) How does indexes in cluster work and sync up > > > > 4) What is the distributed model master/slave multi master > > > > 5) What is co-ordinated by the master node > > > > 6) How is master node elected > > > > > > > > One use case I have is to be able to leverage a oak cluster to be > > able > > > > to upload images/videos and have a consumer on one of the nodes > process > > > it > > > > in a distributed way . I like to try my best to avoid unnecessary > read > > > > checks if possible . > > > > > > > > Thanks > > > > > > > > Emily > > > > > > > > > >