Re: How does oak cluster work

2019-01-07 Thread Marcel Reutegger
Hi,

> Im still very interested in understand some of the design chooses oak
> core team had taken and why . For the long lived snapshots what is use
> case for this and also I like to understand how indexes are sync between
> nodes and the role of a oak leader and how the leader node election
> occurs.

One of the primary use cases is asynchronous index updates. The current
index state is always associated with a snapshot of the repository.
Potential updates of asynchronous indexes are checked periodically by
comparing the current repository state with the snapshot referenced in
the index. If an index update is needed, the new index state will
reference the more recent state of the repository and release the older
snapshot.

Most of the time the repository only has to keep rather recent
snapshots, but there are may also be cases when a snapshot must be kept
for a longer period of time, e.g. when an index is re-created.

Oak indexes are stored in the repository just like regular content. The
only difference is that the actual index data (e.g. the Lucene files) is
stored on hidden nodes. You don't see those node when you access the
repository over the JCR API. The data is managed internally by Oak.

Leader election is not something done by Oak but delegated to another
module. Apache Sling Discovery works well and you can find documentation
here: 
https://sling.apache.org/documentation/bundles/discovery-api-and-impl.html

More general information on Oak is also available here:
https://jackrabbit.apache.org/oak/docs/architecture/nodestate.html

Regards
 Marcel



Re: How does oak cluster work

2019-01-02 Thread ems eril
Hi Team ,

   Im still very interested in understand some of the design chooses oak
core team had taken and why . For the long lived snapshots what is use case
for this and also I like to understand how indexes are sync between nodes
and the role of a oak leader and how the leader node election occurs.

thank

Emily

On Thu, Dec 20, 2018 at 3:02 PM ems eril  wrote:

> Hi Marcel , thanks for the information . I would love to understand the
> use cases for having long lived snapshots in oak . Would you be able for
> provide specific examples or functions within oak that needs this
> capability ?
>
> On Wed, Dec 19, 2018 at 12:43 AM Marcel Reutegger
>  wrote:
>
>> Hi,
>>
>> On 18.12.18, 01:55, "ems eril"  wrote:
>> > 1) Is this a blocking call ? And any plans for callback or java future
>> > support?
>>
>> Yes, Clusterable.isVisible() is a blocking call and you can give it a
>> timeout.
>> There are no plans right now to add an async variant of this feature.
>>
>> > 2) Is there any JCR level API we can use as its currently very low
>> level ?
>>
>> No, there is no JCR/Jackrabbit API equivalent for this feature.
>>
>> > If not is Sling have any plans to use this ?
>>
>> You will have to ask this on the Sling list.
>>
>> > 3) Any reason why documentstore needs to implement revision
>> snapshotting ?
>> > Why can we leverage existing documentstore database capabilities such as
>> > mongo https://docs.mongodb.com/manual/core/wiredtiger/ as most support
>> MVCC
>>
>> In Oak we have the requirement to keep a snapshot of the repository for a
>> longer
>> period of time and not just for concurrency control. E.g. you can create
>> a checkpoint
>> with a lifetime of several days or even months [0].
>>
>> Regards
>>  Marcel
>>
>> [0]
>> https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/NodeStore.html#checkpoint-long-java.util.Map-
>>
>>


Re: How does oak cluster work

2018-12-20 Thread ems eril
Hi Marcel , thanks for the information . I would love to understand the use
cases for having long lived snapshots in oak . Would you be able for
provide specific examples or functions within oak that needs this
capability ?

On Wed, Dec 19, 2018 at 12:43 AM Marcel Reutegger
 wrote:

> Hi,
>
> On 18.12.18, 01:55, "ems eril"  wrote:
> > 1) Is this a blocking call ? And any plans for callback or java future
> > support?
>
> Yes, Clusterable.isVisible() is a blocking call and you can give it a
> timeout.
> There are no plans right now to add an async variant of this feature.
>
> > 2) Is there any JCR level API we can use as its currently very low level
> ?
>
> No, there is no JCR/Jackrabbit API equivalent for this feature.
>
> > If not is Sling have any plans to use this ?
>
> You will have to ask this on the Sling list.
>
> > 3) Any reason why documentstore needs to implement revision snapshotting
> ?
> > Why can we leverage existing documentstore database capabilities such as
> > mongo https://docs.mongodb.com/manual/core/wiredtiger/ as most support
> MVCC
>
> In Oak we have the requirement to keep a snapshot of the repository for a
> longer
> period of time and not just for concurrency control. E.g. you can create a
> checkpoint
> with a lifetime of several days or even months [0].
>
> Regards
>  Marcel
>
> [0]
> https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/NodeStore.html#checkpoint-long-java.util.Map-
>
>


Re: How does oak cluster work

2018-12-19 Thread Marcel Reutegger
Hi,

On 18.12.18, 01:55, "ems eril"  wrote:
> 1) Is this a blocking call ? And any plans for callback or java future
> support?

Yes, Clusterable.isVisible() is a blocking call and you can give it a timeout. 
There are no plans right now to add an async variant of this feature.

> 2) Is there any JCR level API we can use as its currently very low level ?

No, there is no JCR/Jackrabbit API equivalent for this feature.

> If not is Sling have any plans to use this ?

You will have to ask this on the Sling list.

> 3) Any reason why documentstore needs to implement revision snapshotting ?
> Why can we leverage existing documentstore database capabilities such as
> mongo https://docs.mongodb.com/manual/core/wiredtiger/ as most support MVCC

In Oak we have the requirement to keep a snapshot of the repository for a longer
period of time and not just for concurrency control. E.g. you can create a 
checkpoint
with a lifetime of several days or even months [0].

Regards
 Marcel

[0] 
https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/NodeStore.html#checkpoint-long-java.util.Map-



Re: How does oak cluster work

2018-12-17 Thread ems eril
Thank Marcel this is very helpful . Couple of questions I have with this
interface

1) Is this a blocking call ? And any plans for callback or java future
support?
2) Is there any JCR level API we can use as its currently very low level ?
If not is Sling have any plans to use this ?
3) Any reason why documentstore needs to implement revision snapshotting ?
Why can we leverage existing documentstore database capabilities such as
mongo https://docs.mongodb.com/manual/core/wiredtiger/ as most support MVCC
.

Thanks

Emily

On Sun, Dec 16, 2018 at 11:58 PM Marcel Reutegger
 wrote:

> Hi,
>
> There are different ways to approach this in Oak.
>
> Your application can register an event listener and gets notified about
> changes when they are visible on the local cluster node.
>
> The application can store a visibility token with the job data you have in
> Kafka. The visibility token concept is described on the Clusterable [0]
> interface, which is an extension to the NodeStore implemented by the
> DocumentNodeStore. On the processing cluster node the visibility token is
> then used to suspend the job until the changes are visible.
>
> Regards
>  Marcel
>
> [0]
> https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/spi/state/Clusterable.html
>
>
> On 15.12.18, 02:23, "ems eril"  wrote:
>
> Hi Matt ,
>
>   Yes your correct, the job is triggered by consumer listening to kafka
> queue . But to you earlier statement that this is not a Oak issue I
> have to
> disagree . In Mongo you can
> control write concern and make replication synchronize but we cannot do
> something similar in Oak .
>
> Thanks
>
> On Fri, Dec 14, 2018 at 3:25 PM Matt Ryan  wrote:
>
> > Hi,
> >
> > I believe your concern is:  Content could be uploaded to the cluster
> via
> > one Oak instance, and your job to process the content runs in a
> different
> > Oak instance, and that there is a possibility that the job to
> process the
> > content reads from a MongoDB node that has stale data, so the
> content is
> > not available yet.
> >
> > If I've understood your concern correctly, you are correct that this
> is
> > something you have to worry about, that there is a possibility that
> when
> > the job runs it gets stale data because where it reads from has not
> been
> > updated yet.  However, that's not something being caused by Oak;
> this would
> > be something you'd have to deal with whether Oak was there or not, no
> > matter what type of backing database cluster was being used.
> >
> > Maybe I'm still missing something in your question.  How are you
> planning
> > to trigger your job?
> >
> >
> >
> > On Fri, Dec 14, 2018 at 1:01 PM ems eril  wrote:
> >
> > > Hi Matt ,
> > >
> > >I was looking for more details on the inner workings . I came
> across
> > > this https://markmail.org/message/jbkrsmz3krllqghr where it
> mentioned
> > that
> > > changes in the cluster would eventually appear across other nodes
> and
> > this
> > > is not a mongo specific issue but something oak has introduced . I
> can
> > set
> > > the write concern to majority in mongo but if oak has its own
> eventually
> > > consistency model this can cause stale reads from other nodes
> which would
> > > be a problem with the distributed job Im trying to create.
> > >
> > > Thanks
> > >
> > > On Fri, Dec 14, 2018 at 8:02 AM Matt Ryan 
> wrote:
> > >
> > > > Hi Emily,
> > > >
> > > > Content is stored in Oak in two different configurable storage
> > services.
> > > > This is a bit of an oversimplification, but basically the
> structure of
> > > > content repository - the content tree, nodes, properties, etc. -
> is
> > > stored
> > > > in a Node Store [0] and the binary content is stored in a Blob
> Store
> > [1]
> > > > (you'll also sometimes see the term "data store").  Oak manages
> all of
> > > this
> > > > transparently to external clients.
> > > >
> > > > Oak clustering is therefore achieved by configuring Oak
> instances to
> > use
> > > > clusterable storage services underneath [2].  For the node
> store, an
> > > > implementation of a DocumentNodeStore [3] is needed; one such
> > > > implementation uses MongoDB [4].  For the blob store, an
> implementation
> > > of
> > > > a SharedDataStore is needed.  For example, both the
> SharedS3DataStore
> > and
> > > > AzureDataStore implementations can be used as a data store for
> an Oak
> > > > cluster.
> > > >
> > > > So, assume you were using MongoDB and S3.  Setting up an Oak
> cluster
> > then
> > > > merely means that you have more than one Oak instance, each of
> which is
> > > > configured to use the MongoDB cluster as the node store, and S3
> as the
> > > data
> > > > store.
> > > >
> 

Re: How does oak cluster work

2018-12-14 Thread ems eril
Hi Matt ,

  Yes your correct, the job is triggered by consumer listening to kafka
queue . But to you earlier statement that this is not a Oak issue I have to
disagree . In Mongo you can
control write concern and make replication synchronize but we cannot do
something similar in Oak .

Thanks

On Fri, Dec 14, 2018 at 3:25 PM Matt Ryan  wrote:

> Hi,
>
> I believe your concern is:  Content could be uploaded to the cluster via
> one Oak instance, and your job to process the content runs in a different
> Oak instance, and that there is a possibility that the job to process the
> content reads from a MongoDB node that has stale data, so the content is
> not available yet.
>
> If I've understood your concern correctly, you are correct that this is
> something you have to worry about, that there is a possibility that when
> the job runs it gets stale data because where it reads from has not been
> updated yet.  However, that's not something being caused by Oak; this would
> be something you'd have to deal with whether Oak was there or not, no
> matter what type of backing database cluster was being used.
>
> Maybe I'm still missing something in your question.  How are you planning
> to trigger your job?
>
>
>
> On Fri, Dec 14, 2018 at 1:01 PM ems eril  wrote:
>
> > Hi Matt ,
> >
> >I was looking for more details on the inner workings . I came across
> > this https://markmail.org/message/jbkrsmz3krllqghr where it mentioned
> that
> > changes in the cluster would eventually appear across other nodes and
> this
> > is not a mongo specific issue but something oak has introduced . I can
> set
> > the write concern to majority in mongo but if oak has its own eventually
> > consistency model this can cause stale reads from other nodes which would
> > be a problem with the distributed job Im trying to create.
> >
> > Thanks
> >
> > On Fri, Dec 14, 2018 at 8:02 AM Matt Ryan  wrote:
> >
> > > Hi Emily,
> > >
> > > Content is stored in Oak in two different configurable storage
> services.
> > > This is a bit of an oversimplification, but basically the structure of
> > > content repository - the content tree, nodes, properties, etc. - is
> > stored
> > > in a Node Store [0] and the binary content is stored in a Blob Store
> [1]
> > > (you'll also sometimes see the term "data store").  Oak manages all of
> > this
> > > transparently to external clients.
> > >
> > > Oak clustering is therefore achieved by configuring Oak instances to
> use
> > > clusterable storage services underneath [2].  For the node store, an
> > > implementation of a DocumentNodeStore [3] is needed; one such
> > > implementation uses MongoDB [4].  For the blob store, an implementation
> > of
> > > a SharedDataStore is needed.  For example, both the SharedS3DataStore
> and
> > > AzureDataStore implementations can be used as a data store for an Oak
> > > cluster.
> > >
> > > So, assume you were using MongoDB and S3.  Setting up an Oak cluster
> then
> > > merely means that you have more than one Oak instance, each of which is
> > > configured to use the MongoDB cluster as the node store, and S3 as the
> > data
> > > store.
> > >
> > >
> > > [0] -
> > >
> > >
> >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/overview.md
> > > [1] -
> > >
> > >
> >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/plugins/blobstore.md
> > > [2] -
> > >
> > >
> >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/clustering.md
> > > [3] -
> > >
> > >
> >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md
> > > [4] -
> > >
> > >
> >
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/document/mongo-document-store.md
> > >
> > >
> > > Does that help?
> > >
> > >
> > > -MR
> > >
> > > On Thu, Dec 13, 2018 at 5:52 PM ems eril  wrote:
> > >
> > > > Hi Team ,
> > > >
> > > >Im really interested in understanding how oak cluster works and
> how
> > do
> > > > cluster nodes sync up . These are some of the questions I have
> > > >
> > > > 1) How does the nodes sync
> > > > 2) What is the mongo role
> > > > 3) How does indexes in cluster work and sync up
> > > > 4) What is the distributed model master/slave multi master
> > > > 5) What is co-ordinated by the master node
> > > > 6) How is master node elected
> > > >
> > > >One use case I have is to be able to leverage a oak cluster to be
> > able
> > > > to upload images/videos and have a consumer on one of the nodes
> process
> > > it
> > > > in a distributed way . I like to try my best to avoid unnecessary
> read
> > > > checks if possible .
> > > >
> > > > Thanks
> > > >
> > > > Emily
> > > >
> > >
> >
>


Re: How does oak cluster work

2018-12-14 Thread ems eril
Hi Matt ,

   I was looking for more details on the inner workings . I came across
this https://markmail.org/message/jbkrsmz3krllqghr where it mentioned that
changes in the cluster would eventually appear across other nodes and this
is not a mongo specific issue but something oak has introduced . I can set
the write concern to majority in mongo but if oak has its own eventually
consistency model this can cause stale reads from other nodes which would
be a problem with the distributed job Im trying to create.

Thanks

On Fri, Dec 14, 2018 at 8:02 AM Matt Ryan  wrote:

> Hi Emily,
>
> Content is stored in Oak in two different configurable storage services.
> This is a bit of an oversimplification, but basically the structure of
> content repository - the content tree, nodes, properties, etc. - is stored
> in a Node Store [0] and the binary content is stored in a Blob Store [1]
> (you'll also sometimes see the term "data store").  Oak manages all of this
> transparently to external clients.
>
> Oak clustering is therefore achieved by configuring Oak instances to use
> clusterable storage services underneath [2].  For the node store, an
> implementation of a DocumentNodeStore [3] is needed; one such
> implementation uses MongoDB [4].  For the blob store, an implementation of
> a SharedDataStore is needed.  For example, both the SharedS3DataStore and
> AzureDataStore implementations can be used as a data store for an Oak
> cluster.
>
> So, assume you were using MongoDB and S3.  Setting up an Oak cluster then
> merely means that you have more than one Oak instance, each of which is
> configured to use the MongoDB cluster as the node store, and S3 as the data
> store.
>
>
> [0] -
>
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/overview.md
> [1] -
>
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/plugins/blobstore.md
> [2] -
>
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/clustering.md
> [3] -
>
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md
> [4] -
>
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/document/mongo-document-store.md
>
>
> Does that help?
>
>
> -MR
>
> On Thu, Dec 13, 2018 at 5:52 PM ems eril  wrote:
>
> > Hi Team ,
> >
> >Im really interested in understanding how oak cluster works and how do
> > cluster nodes sync up . These are some of the questions I have
> >
> > 1) How does the nodes sync
> > 2) What is the mongo role
> > 3) How does indexes in cluster work and sync up
> > 4) What is the distributed model master/slave multi master
> > 5) What is co-ordinated by the master node
> > 6) How is master node elected
> >
> >One use case I have is to be able to leverage a oak cluster to be able
> > to upload images/videos and have a consumer on one of the nodes process
> it
> > in a distributed way . I like to try my best to avoid unnecessary read
> > checks if possible .
> >
> > Thanks
> >
> > Emily
> >
>


Re: How does oak cluster work

2018-12-14 Thread Matt Ryan
Hi Emily,

Content is stored in Oak in two different configurable storage services.
This is a bit of an oversimplification, but basically the structure of
content repository - the content tree, nodes, properties, etc. - is stored
in a Node Store [0] and the binary content is stored in a Blob Store [1]
(you'll also sometimes see the term "data store").  Oak manages all of this
transparently to external clients.

Oak clustering is therefore achieved by configuring Oak instances to use
clusterable storage services underneath [2].  For the node store, an
implementation of a DocumentNodeStore [3] is needed; one such
implementation uses MongoDB [4].  For the blob store, an implementation of
a SharedDataStore is needed.  For example, both the SharedS3DataStore and
AzureDataStore implementations can be used as a data store for an Oak
cluster.

So, assume you were using MongoDB and S3.  Setting up an Oak cluster then
merely means that you have more than one Oak instance, each of which is
configured to use the MongoDB cluster as the node store, and S3 as the data
store.


[0] -
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/overview.md
[1] -
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/plugins/blobstore.md
[2] -
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/clustering.md
[3] -
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/documentmk.md
[4] -
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-doc/src/site/markdown/nodestore/document/mongo-document-store.md


Does that help?


-MR

On Thu, Dec 13, 2018 at 5:52 PM ems eril  wrote:

> Hi Team ,
>
>Im really interested in understanding how oak cluster works and how do
> cluster nodes sync up . These are some of the questions I have
>
> 1) How does the nodes sync
> 2) What is the mongo role
> 3) How does indexes in cluster work and sync up
> 4) What is the distributed model master/slave multi master
> 5) What is co-ordinated by the master node
> 6) How is master node elected
>
>One use case I have is to be able to leverage a oak cluster to be able
> to upload images/videos and have a consumer on one of the nodes process it
> in a distributed way . I like to try my best to avoid unnecessary read
> checks if possible .
>
> Thanks
>
> Emily
>