Re: How does oak cluster work

ems eril Mon, 17 Dec 2018 16:55:46 -0800

Thank Marcel this is very helpful . Couple of questions I have with this
interface


1) Is this a blocking call ? And any plans for callback or java future
support?
2) Is there any JCR level API we can use as its currently very low level ?
If not is Sling have any plans to use this ?
3) Any reason why documentstore needs to implement revision snapshotting ?
Why can we leverage existing documentstore database capabilities such as
mongo https://docs.mongodb.com/manual/core/wiredtiger/ as most support MVCC
.

Thanks

Emily

On Sun, Dec 16, 2018 at 11:58 PM Marcel Reutegger
<mreut...@adobe.com.invalid> wrote:

> Hi,
>
> There are different ways to approach this in Oak.
>
> Your application can register an event listener and gets notified about
> changes when they are visible on the local cluster node.
>
> The application can store a visibility token with the job data you have in
> Kafka. The visibility token concept is described on the Clusterable [0]
> interface, which is an extension to the NodeStore implemented by the
> DocumentNodeStore. On the processing cluster node the visibility token is
> then used to suspend the job until the changes are visible.
>
> Regards
>  Marcel
>
> [0]
> https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/Clusterable.html
>
>
> On 15.12.18, 02:23, "ems eril" <emsro...@gmail.com> wrote:
>
>     Hi Matt ,
>
>       Yes your correct, the job is triggered by consumer listening to kafka
>     queue . But to you earlier statement that this is not a Oak issue I
> have to
>     disagree . In Mongo you can
>     control write concern and make replication synchronize but we cannot do
>     something similar in Oak .
>
>     Thanks
>
>     On Fri, Dec 14, 2018 at 3:25 PM Matt Ryan <mattr...@apache.org> wrote:
>
>     > Hi,
>     >
>     > I believe your concern is:  Content could be uploaded to the cluster
> via
>     > one Oak instance, and your job to process the content runs in a
> different
>     > Oak instance, and that there is a possibility that the job to
> process the
>     > content reads from a MongoDB node that has stale data, so the
> content is
>     > not available yet.
>     >
>     > If I've understood your concern correctly, you are correct that this
> is
>     > something you have to worry about, that there is a possibility that
> when
>     > the job runs it gets stale data because where it reads from has not
> been
>     > updated yet.  However, that's not something being caused by Oak;
> this would
>     > be something you'd have to deal with whether Oak was there or not, no
>     > matter what type of backing database cluster was being used.
>     >
>     > Maybe I'm still missing something in your question.  How are you
> planning
>     > to trigger your job?
>     >
>     >
>     >
>     > On Fri, Dec 14, 2018 at 1:01 PM ems eril <emsro...@gmail.com> wrote:
>     >
>     > > Hi Matt ,
>     > >
>     > >    I was looking for more details on the inner workings . I came
> across
>     > > this https://markmail.org/message/jbkrsmz3krllqghr where it
> mentioned
>     > that
>     > > changes in the cluster would eventually appear across other nodes
> and
>     > this
>     > > is not a mongo specific issue but something oak has introduced . I
> can
>     > set
>     > > the write concern to majority in mongo but if oak has its own
> eventually
>     > > consistency model this can cause stale reads from other nodes
> which would
>     > > be a problem with the distributed job Im trying to create.
>     > >
>     > > Thanks
>     > >
>     > > On Fri, Dec 14, 2018 at 8:02 AM Matt Ryan <mattr...@apache.org>
> wrote:
>     > >
>     > > > Hi Emily,
>     > > >
>     > > > Content is stored in Oak in two different configurable storage
>     > services.
>     > > > This is a bit of an oversimplification, but basically the
> structure of
>     > > > content repository - the content tree, nodes, properties, etc. -
> is
>     > > stored
>     > > > in a Node Store [0] and the binary content is stored in a Blob
> Store
>     > [1]
>     > > > (you'll also sometimes see the term "data store").  Oak manages
> all of
>     > > this
>     > > > transparently to external clients.
>     > > >
>     > > > Oak clustering is therefore achieved by configuring Oak
> instances to
>     > use
>     > > > clusterable storage services underneath [2].  For the node
> store, an
>     > > > implementation of a DocumentNodeStore [3] is needed; one such
>     > > > implementation uses MongoDB [4].  For the blob store, an
> implementation
>     > > of
>     > > > a SharedDataStore is needed.  For example, both the
> SharedS3DataStore
>     > and
>     > > > AzureDataStore implementations can be used as a data store for
> an Oak
>     > > > cluster.
>     > > >
>     > > > So, assume you were using MongoDB and S3.  Setting up an Oak
> cluster
>     > then
>     > > > merely means that you have more than one Oak instance, each of
> which is
>     > > > configured to use the MongoDB cluster as the node store, and S3
> as the
>     > > data
>     > > > store.
>     > > >
>     > > >
>     > > > [0] -
>     > > >
>     > > >
>     > >
>     >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/overview.md
>     > > > [1] -
>     > > >
>     > > >
>     > >
>     >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/plugins/blobstore.md
>     > > > [2] -
>     > > >
>     > > >
>     > >
>     >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/clustering.md
>     > > > [3] -
>     > > >
>     > > >
>     > >
>     >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md
>     > > > [4] -
>     > > >
>     > > >
>     > >
>     >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/document/mongo-document-store.md
>     > > >
>     > > >
>     > > > Does that help?
>     > > >
>     > > >
>     > > > -MR
>     > > >
>     > > > On Thu, Dec 13, 2018 at 5:52 PM ems eril <emsro...@gmail.com>
> wrote:
>     > > >
>     > > > > Hi Team ,
>     > > > >
>     > > > >    Im really interested in understanding how oak cluster works
> and
>     > how
>     > > do
>     > > > > cluster nodes sync up . These are some of the questions I have
>     > > > >
>     > > > > 1) How does the nodes sync
>     > > > > 2) What is the mongo role
>     > > > > 3) How does indexes in cluster work and sync up
>     > > > > 4) What is the distributed model master/slave multi master
>     > > > > 5) What is co-ordinated by the master node
>     > > > > 6) How is master node elected
>     > > > >
>     > > > >    One use case I have is to be able to leverage a oak cluster
> to be
>     > > able
>     > > > > to upload images/videos and have a consumer on one of the nodes
>     > process
>     > > > it
>     > > > > in a distributed way . I like to try my best to avoid
> unnecessary
>     > read
>     > > > > checks if possible .
>     > > > >
>     > > > > Thanks
>     > > > >
>     > > > > Emily
>     > > > >
>     > > >
>     > >
>     >
>
>
>

Re: How does oak cluster work

Reply via email to