Hi, I believe your concern is: Content could be uploaded to the cluster via one Oak instance, and your job to process the content runs in a different Oak instance, and that there is a possibility that the job to process the content reads from a MongoDB node that has stale data, so the content is not available yet.
If I've understood your concern correctly, you are correct that this is something you have to worry about, that there is a possibility that when the job runs it gets stale data because where it reads from has not been updated yet. However, that's not something being caused by Oak; this would be something you'd have to deal with whether Oak was there or not, no matter what type of backing database cluster was being used. Maybe I'm still missing something in your question. How are you planning to trigger your job? On Fri, Dec 14, 2018 at 1:01 PM ems eril <[email protected]> wrote: > Hi Matt , > > I was looking for more details on the inner workings . I came across > this https://markmail.org/message/jbkrsmz3krllqghr where it mentioned that > changes in the cluster would eventually appear across other nodes and this > is not a mongo specific issue but something oak has introduced . I can set > the write concern to majority in mongo but if oak has its own eventually > consistency model this can cause stale reads from other nodes which would > be a problem with the distributed job Im trying to create. > > Thanks > > On Fri, Dec 14, 2018 at 8:02 AM Matt Ryan <[email protected]> wrote: > > > Hi Emily, > > > > Content is stored in Oak in two different configurable storage services. > > This is a bit of an oversimplification, but basically the structure of > > content repository - the content tree, nodes, properties, etc. - is > stored > > in a Node Store [0] and the binary content is stored in a Blob Store [1] > > (you'll also sometimes see the term "data store"). Oak manages all of > this > > transparently to external clients. > > > > Oak clustering is therefore achieved by configuring Oak instances to use > > clusterable storage services underneath [2]. For the node store, an > > implementation of a DocumentNodeStore [3] is needed; one such > > implementation uses MongoDB [4]. For the blob store, an implementation > of > > a SharedDataStore is needed. For example, both the SharedS3DataStore and > > AzureDataStore implementations can be used as a data store for an Oak > > cluster. > > > > So, assume you were using MongoDB and S3. Setting up an Oak cluster then > > merely means that you have more than one Oak instance, each of which is > > configured to use the MongoDB cluster as the node store, and S3 as the > data > > store. > > > > > > [0] - > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/overview.md > > [1] - > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/plugins/blobstore.md > > [2] - > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/clustering.md > > [3] - > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md > > [4] - > > > > > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/document/mongo-document-store.md > > > > > > Does that help? > > > > > > -MR > > > > On Thu, Dec 13, 2018 at 5:52 PM ems eril <[email protected]> wrote: > > > > > Hi Team , > > > > > > Im really interested in understanding how oak cluster works and how > do > > > cluster nodes sync up . These are some of the questions I have > > > > > > 1) How does the nodes sync > > > 2) What is the mongo role > > > 3) How does indexes in cluster work and sync up > > > 4) What is the distributed model master/slave multi master > > > 5) What is co-ordinated by the master node > > > 6) How is master node elected > > > > > > One use case I have is to be able to leverage a oak cluster to be > able > > > to upload images/videos and have a consumer on one of the nodes process > > it > > > in a distributed way . I like to try my best to avoid unnecessary read > > > checks if possible . > > > > > > Thanks > > > > > > Emily > > > > > >
