Re: How does oak cluster work

2019-01-02 Thread ems eril
Hi Team ,

   Im still very interested in understand some of the design chooses oak
core team had taken and why . For the long lived snapshots what is use case
for this and also I like to understand how indexes are sync between nodes
and the role of a oak leader and how the leader node election occurs.

thank

Emily

On Thu, Dec 20, 2018 at 3:02 PM ems eril  wrote:

> Hi Marcel , thanks for the information . I would love to understand the
> use cases for having long lived snapshots in oak . Would you be able for
> provide specific examples or functions within oak that needs this
> capability ?
>
> On Wed, Dec 19, 2018 at 12:43 AM Marcel Reutegger
>  wrote:
>
>> Hi,
>>
>> On 18.12.18, 01:55, "ems eril"  wrote:
>> > 1) Is this a blocking call ? And any plans for callback or java future
>> > support?
>>
>> Yes, Clusterable.isVisible() is a blocking call and you can give it a
>> timeout.
>> There are no plans right now to add an async variant of this feature.
>>
>> > 2) Is there any JCR level API we can use as its currently very low
>> level ?
>>
>> No, there is no JCR/Jackrabbit API equivalent for this feature.
>>
>> > If not is Sling have any plans to use this ?
>>
>> You will have to ask this on the Sling list.
>>
>> > 3) Any reason why documentstore needs to implement revision
>> snapshotting ?
>> > Why can we leverage existing documentstore database capabilities such as
>> > mongo https://docs.mongodb.com/manual/core/wiredtiger/ as most support
>> MVCC
>>
>> In Oak we have the requirement to keep a snapshot of the repository for a
>> longer
>> period of time and not just for concurrency control. E.g. you can create
>> a checkpoint
>> with a lifetime of several days or even months [0].
>>
>> Regards
>>  Marcel
>>
>> [0]
>> https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/NodeStore.html#checkpoint-long-java.util.Map-
>>
>>


Re: How does oak cluster work

2018-12-20 Thread ems eril
Hi Marcel , thanks for the information . I would love to understand the use
cases for having long lived snapshots in oak . Would you be able for
provide specific examples or functions within oak that needs this
capability ?

On Wed, Dec 19, 2018 at 12:43 AM Marcel Reutegger
 wrote:

> Hi,
>
> On 18.12.18, 01:55, "ems eril"  wrote:
> > 1) Is this a blocking call ? And any plans for callback or java future
> > support?
>
> Yes, Clusterable.isVisible() is a blocking call and you can give it a
> timeout.
> There are no plans right now to add an async variant of this feature.
>
> > 2) Is there any JCR level API we can use as its currently very low level
> ?
>
> No, there is no JCR/Jackrabbit API equivalent for this feature.
>
> > If not is Sling have any plans to use this ?
>
> You will have to ask this on the Sling list.
>
> > 3) Any reason why documentstore needs to implement revision snapshotting
> ?
> > Why can we leverage existing documentstore database capabilities such as
> > mongo https://docs.mongodb.com/manual/core/wiredtiger/ as most support
> MVCC
>
> In Oak we have the requirement to keep a snapshot of the repository for a
> longer
> period of time and not just for concurrency control. E.g. you can create a
> checkpoint
> with a lifetime of several days or even months [0].
>
> Regards
>  Marcel
>
> [0]
> https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/NodeStore.html#checkpoint-long-java.util.Map-
>
>


Re: How does oak cluster work

2018-12-17 Thread ems eril
Thank Marcel this is very helpful . Couple of questions I have with this
interface

1) Is this a blocking call ? And any plans for callback or java future
support?
2) Is there any JCR level API we can use as its currently very low level ?
If not is Sling have any plans to use this ?
3) Any reason why documentstore needs to implement revision snapshotting ?
Why can we leverage existing documentstore database capabilities such as
mongo https://docs.mongodb.com/manual/core/wiredtiger/ as most support MVCC
.

Thanks

Emily

On Sun, Dec 16, 2018 at 11:58 PM Marcel Reutegger
 wrote:

> Hi,
>
> There are different ways to approach this in Oak.
>
> Your application can register an event listener and gets notified about
> changes when they are visible on the local cluster node.
>
> The application can store a visibility token with the job data you have in
> Kafka. The visibility token concept is described on the Clusterable [0]
> interface, which is an extension to the NodeStore implemented by the
> DocumentNodeStore. On the processing cluster node the visibility token is
> then used to suspend the job until the changes are visible.
>
> Regards
>  Marcel
>
> [0]
> https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/Clusterable.html
>
>
> On 15.12.18, 02:23, "ems eril"  wrote:
>
> Hi Matt ,
>
>   Yes your correct, the job is triggered by consumer listening to kafka
> queue . But to you earlier statement that this is not a Oak issue I
> have to
> disagree . In Mongo you can
> control write concern and make replication synchronize but we cannot do
> something similar in Oak .
>
> Thanks
>
> On Fri, Dec 14, 2018 at 3:25 PM Matt Ryan  wrote:
>
> > Hi,
> >
> > I believe your concern is:  Content could be uploaded to the cluster
> via
> > one Oak instance, and your job to process the content runs in a
> different
> > Oak instance, and that there is a possibility that the job to
> process the
> > content reads from a MongoDB node that has stale data, so the
> content is
> > not available yet.
> >
> > If I've understood your concern correctly, you are correct that this
> is
> > something you have to worry about, that there is a possibility that
> when
> > the job runs it gets stale data because where it reads from has not
> been
> > updated yet.  However, that's not something being caused by Oak;
> this would
> > be something you'd have to deal with whether Oak was there or not, no
> > matter what type of backing database cluster was being used.
> >
> > Maybe I'm still missing something in your question.  How are you
> planning
> > to trigger your job?
> >
> >
> >
> > On Fri, Dec 14, 2018 at 1:01 PM ems eril  wrote:
> >
> > > Hi Matt ,
> > >
> > >I was looking for more details on the inner workings . I came
> across
> > > this https://markmail.org/message/jbkrsmz3krllqghr where it
> mentioned
> > that
> > > changes in the cluster would eventually appear across other nodes
> and
> > this
> > > is not a mongo specific issue but something oak has introduced . I
> can
> > set
> > > the write concern to majority in mongo but if oak has its own
> eventually
> > > consistency model this can cause stale reads from other nodes
> which would
> > > be a problem with the distributed job Im trying to create.
> > >
> > > Thanks
> > >
> > > On Fri, Dec 14, 2018 at 8:02 AM Matt Ryan 
> wrote:
> > >
> > > > Hi Emily,
> > > >
> > > > Content is stored in Oak in two different configurable storage
> > services.
> > > > This is a bit of an oversimplification, but basically the
> structure of
> > > > content repository - the content tree, nodes, properties, etc. -
> is
> > > stored
> > > > in a Node Store [0] and the binary content is stored in a Blob
> Store
> > [1]
> > > > (you'll also sometimes see the term "data store").  Oak manages
> all of
> > > this
> > > > transparently to external clients.
> > > >
> > > > Oak clustering is therefore achieved by configuring Oak
> instances to
> > use
> > > > clusterable storage services underneath [2].  For the node
> store, an
> > > > implementation of a Docume

Re: How does oak cluster work

2018-12-14 Thread ems eril
Hi Matt ,

  Yes your correct, the job is triggered by consumer listening to kafka
queue . But to you earlier statement that this is not a Oak issue I have to
disagree . In Mongo you can
control write concern and make replication synchronize but we cannot do
something similar in Oak .

Thanks

On Fri, Dec 14, 2018 at 3:25 PM Matt Ryan  wrote:

> Hi,
>
> I believe your concern is:  Content could be uploaded to the cluster via
> one Oak instance, and your job to process the content runs in a different
> Oak instance, and that there is a possibility that the job to process the
> content reads from a MongoDB node that has stale data, so the content is
> not available yet.
>
> If I've understood your concern correctly, you are correct that this is
> something you have to worry about, that there is a possibility that when
> the job runs it gets stale data because where it reads from has not been
> updated yet.  However, that's not something being caused by Oak; this would
> be something you'd have to deal with whether Oak was there or not, no
> matter what type of backing database cluster was being used.
>
> Maybe I'm still missing something in your question.  How are you planning
> to trigger your job?
>
>
>
> On Fri, Dec 14, 2018 at 1:01 PM ems eril  wrote:
>
> > Hi Matt ,
> >
> >I was looking for more details on the inner workings . I came across
> > this https://markmail.org/message/jbkrsmz3krllqghr where it mentioned
> that
> > changes in the cluster would eventually appear across other nodes and
> this
> > is not a mongo specific issue but something oak has introduced . I can
> set
> > the write concern to majority in mongo but if oak has its own eventually
> > consistency model this can cause stale reads from other nodes which would
> > be a problem with the distributed job Im trying to create.
> >
> > Thanks
> >
> > On Fri, Dec 14, 2018 at 8:02 AM Matt Ryan  wrote:
> >
> > > Hi Emily,
> > >
> > > Content is stored in Oak in two different configurable storage
> services.
> > > This is a bit of an oversimplification, but basically the structure of
> > > content repository - the content tree, nodes, properties, etc. - is
> > stored
> > > in a Node Store [0] and the binary content is stored in a Blob Store
> [1]
> > > (you'll also sometimes see the term "data store").  Oak manages all of
> > this
> > > transparently to external clients.
> > >
> > > Oak clustering is therefore achieved by configuring Oak instances to
> use
> > > clusterable storage services underneath [2].  For the node store, an
> > > implementation of a DocumentNodeStore [3] is needed; one such
> > > implementation uses MongoDB [4].  For the blob store, an implementation
> > of
> > > a SharedDataStore is needed.  For example, both the SharedS3DataStore
> and
> > > AzureDataStore implementations can be used as a data store for an Oak
> > > cluster.
> > >
> > > So, assume you were using MongoDB and S3.  Setting up an Oak cluster
> then
> > > merely means that you have more than one Oak instance, each of which is
> > > configured to use the MongoDB cluster as the node store, and S3 as the
> > data
> > > store.
> > >
> > >
> > > [0] -
> > >
> > >
> >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/overview.md
> > > [1] -
> > >
> > >
> >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/plugins/blobstore.md
> > > [2] -
> > >
> > >
> >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/clustering.md
> > > [3] -
> > >
> > >
> >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md
> > > [4] -
> > >
> > >
> >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/document/mongo-document-store.md
> > >
> > >
> > > Does that help?
> > >
> > >
> > > -MR
> > >
> > > On Thu, Dec 13, 2018 at 5:52 PM ems eril  wrote:
> > >
> > > > Hi Team ,
> > > >
> > > >Im really interested in understanding how oak cluster works and
> how
> > do
> > > > cluster nodes sync up . These are some of the questions I have
> > > >
> > > > 1) How does the nodes sync
> > > > 2) What is the mongo role
> > > > 3) How does indexes in cluster work and sync up
> > > > 4) What is the distributed model master/slave multi master
> > > > 5) What is co-ordinated by the master node
> > > > 6) How is master node elected
> > > >
> > > >One use case I have is to be able to leverage a oak cluster to be
> > able
> > > > to upload images/videos and have a consumer on one of the nodes
> process
> > > it
> > > > in a distributed way . I like to try my best to avoid unnecessary
> read
> > > > checks if possible .
> > > >
> > > > Thanks
> > > >
> > > > Emily
> > > >
> > >
> >
>


Re: How does oak cluster work

2018-12-14 Thread ems eril
Hi Matt ,

   I was looking for more details on the inner workings . I came across
this https://markmail.org/message/jbkrsmz3krllqghr where it mentioned that
changes in the cluster would eventually appear across other nodes and this
is not a mongo specific issue but something oak has introduced . I can set
the write concern to majority in mongo but if oak has its own eventually
consistency model this can cause stale reads from other nodes which would
be a problem with the distributed job Im trying to create.

Thanks

On Fri, Dec 14, 2018 at 8:02 AM Matt Ryan  wrote:

> Hi Emily,
>
> Content is stored in Oak in two different configurable storage services.
> This is a bit of an oversimplification, but basically the structure of
> content repository - the content tree, nodes, properties, etc. - is stored
> in a Node Store [0] and the binary content is stored in a Blob Store [1]
> (you'll also sometimes see the term "data store").  Oak manages all of this
> transparently to external clients.
>
> Oak clustering is therefore achieved by configuring Oak instances to use
> clusterable storage services underneath [2].  For the node store, an
> implementation of a DocumentNodeStore [3] is needed; one such
> implementation uses MongoDB [4].  For the blob store, an implementation of
> a SharedDataStore is needed.  For example, both the SharedS3DataStore and
> AzureDataStore implementations can be used as a data store for an Oak
> cluster.
>
> So, assume you were using MongoDB and S3.  Setting up an Oak cluster then
> merely means that you have more than one Oak instance, each of which is
> configured to use the MongoDB cluster as the node store, and S3 as the data
> store.
>
>
> [0] -
>
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/overview.md
> [1] -
>
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/plugins/blobstore.md
> [2] -
>
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/clustering.md
> [3] -
>
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md
> [4] -
>
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/document/mongo-document-store.md
>
>
> Does that help?
>
>
> -MR
>
> On Thu, Dec 13, 2018 at 5:52 PM ems eril  wrote:
>
> > Hi Team ,
> >
> >Im really interested in understanding how oak cluster works and how do
> > cluster nodes sync up . These are some of the questions I have
> >
> > 1) How does the nodes sync
> > 2) What is the mongo role
> > 3) How does indexes in cluster work and sync up
> > 4) What is the distributed model master/slave multi master
> > 5) What is co-ordinated by the master node
> > 6) How is master node elected
> >
> >One use case I have is to be able to leverage a oak cluster to be able
> > to upload images/videos and have a consumer on one of the nodes process
> it
> > in a distributed way . I like to try my best to avoid unnecessary read
> > checks if possible .
> >
> > Thanks
> >
> > Emily
> >
>