Re: Oak Indexing. Was Re: Property index replacement / evolution

2016-08-16 Thread Ard Schrijvers
Hey,

I've caught up with all mails in this thread, and would like to make
some general remarks. Admittedly, I do not yet work with oak and do
not yet know much about its indexing strategy/implementation,  but I
do know quite some details about the old JR2 index implementation,
about ES, about Lucene and about JCR in general.

I do agree with Ian that in the past every attempt to store the Lucene
index not near the code has failed. I think he forgot to mention
Lucandra :-). About 8 years ago Simon Willnauer was pretty explicit in
a talk with me about it: Bring the computation to the data with
Lucene, every other attempt will fail. I also talked with Jukka (5
years ago?) when he explained me the oak indexing setup. I asked him
about how this would work because bringing the data to the code
(during query execution) doesn't perform. Obviously Jukka was aware.

AFAIU, oak does have the Lucene segments from the storage (mongoDB)
locally. So it doesn't bring the data to the computation during query
execution. The Lucene data is local. In this sense, I think Ian's fear
is not correct wrt having the Lucene index not local (it is confusing,
it is stored externally, but when used, copied locally...that is at
least what I understand)

With respect to using ES (and sharding) embedded or not in oak, I
consider the crux of the requirement being well explained by Chetan:

QR1. Application Query - These mostly involve some property
restrictions and are invoked by code itself to perform some operation.


QR2. User provided query - These queries would consist of both or
either of property restriction and fulltext constraints. 

With ES (with sharding), the QR1 type queries will never be fast
enough. We (Hippo) have code that can result in hundreds of queries
for a single request (for example for every document in a folder show
the translations of the document). In JR2, simple queries return
within 1 ms (and faster). You'll never be able to deliver this with ES
(clustered with sharding). Only network latency is magnitudes higher.
Obviously I do *not* claim that ES has a worse Lucene implementation
than JR2 has. Quite surely the opposite, but the implementation serves
a very different purpose. It is like comparing a ConcurrentHashMap as
cache with a Terracotta cluster wide cache. Some use cases require the
one, some the other.

Also what I did not see being mentioned in this thread, is
authorization (aka fine-grained ACLs). If you include the ACL
requirements, using an ES index (with sharding) will become even more
problematic: How many query results to fetch if you don't know how
many are allowed to be read? What if you want to return 1.000 hits,
but the JCR user has only read access to about 1%. Fetch 100.000 hits
from ES? And then 100.000 more if you did not find 1.000 authorized?

In JR2, at Hippo we combine with every query also a 'Lucene
authorization query'. This authorization query easily becomes a nested
boolean query with hundreds of boolean queries nested. The only way
this performs is using a caching Lucene filter [1]. I doubt if this is
possible with ES (perhaps with a custom endpoint and some token that
maps to an authorization query). Either way, long story short, I think
ES serves different use cases much much better than JR2 or oak will
ever be able to do. At Hippo we store for example every visitor their
page request including meta data in ES to support trends on data. ES
is perfect for this. I'd never want to store this in a hierarchical
content structure, with versioning, with eventual consistency, with
ACL support, with support for moving of subtrees, etc : But it is
these features that imho make ES in turn unsuited for supporting the
QR1 type of queries for JCR.

AFAIC judge, the hybrid approach suggested by Chetan makes sense to
me. Next to that, support for ES to support QR2 type of queries make
sense (possibly with a delay because they are less application kind of
queries). However, I consider ES support more as an integration
feature, not a core oak requirement.

Some general other remarks:

Some mails were about that text extraction is expensive, and that this
validates having the index in the database. I don't fully agree. Text
extraction is only expensive for (some) binaries, most notably PDFs.
At Hippo we therefor store a sibling for the jcr:data, namely the
binary 'hippo:text'. If hippo:text is present, we do not extract the
jcr:data but use the hippo:text binary (which is the extracted text
and thus only needs to be extracted once). With this kind of approach,
text extraction also only happens once. This does not require an index
to be stored in the repository.

Some mention was made about JR2 that when a cluster node crashes, its
index might be corrupt. Perhaps when the node crashes because disk is
full, but otherwise, the index is in general not corrupt. Namely there
is a redo.log file on FS which contains the jcr nodes which are
indexed in the 'in memory index' which is not yet flushed to 

Re: Oak Indexing. Was Re: Property index replacement / evolution

2016-08-11 Thread Ian Boston
Hi,

On 11 August 2016 at 13:03, Chetan Mehrotra 
wrote:

> On Thu, Aug 11, 2016 at 5:19 PM, Ian Boston  wrote:
> > correct.
> > Documents are shared by ID so all updates hit the same shard.
> > That may result in network traffic if the shard is not local.
>
> Focusing on ordering part as that is the most critical aspect compared
> to other. (BAckup and Restore with sharded index is a separate problem
> to discuss but later)
>
> So even if there is a single master for a given path how would it
> order the changes. Given local changes only give partial view of end
> state.
>

In theory, the index should be driven by the eventual consistency of the
source repository, eventually reaching the same consistent state, and
updating on each state change. That probably means the queue should only
contain pointers to Documents and only index the Document as retrieved. I
dont know if that can ever work.


>
> Also in such a setup would each query need to consider multiple shards
> for final result or each node would "eventually" sync index changes
> from other nodes (complete replication) and query would only use local
> index
>
> For me ensuring consistency in how index updates are sent to ES wrt
> Oak view of changes was kind of blocking feature to enable
> parallelization of indexing process. It needs to be ensured that for
> concurrent commit end result in index is in sync with repository
> state.
>

agreed, me also on various attempts.


>
> Current single thread async index update avoid all such race condition.
>

Perhaps this is the "root" of the problem. The only way to index Oak
consistently is with a single thread globally, as is done now

That's still possible with ES.
Run a single thread on the master, that indexes into a co-located ES
cluster.
If the full text extraction is distributed, then master only needs to
resource writing the local shard.
Its not as good as parallelising the queue, but given the structure of Oak
might be the only way.

Even so, future revisions will be in the index long before Oak has synced
the root document.

The current implementation doesn't have to think about this as the indexing
is single threaded globally *and* each segment update committed first by a
hard lucene commit and second by a root document sync guaranteeing the
sequential update nature.

BTW, how does Hybrid manage to parallelise the indexing and maintain
consistency ?

Best Regards
Ian



>
> Chetan Mehrotra
>


Re: Oak Indexing. Was Re: Property index replacement / evolution

2016-08-11 Thread Ian Boston
On 11 August 2016 at 11:10, Chetan Mehrotra 
wrote:

> On Thu, Aug 11, 2016 at 3:03 PM, Ian Boston  wrote:
> > Both Solr Cloud and ES address this by sharding and
> > replicating the indexes, so that all commits are soft, instant and real
> > time. That introduces problems.
> ...
> > Both Solr Cloud and ES address this by sharding and
> > replicating the indexes, so that all commits are soft, instant and real
> > time.
>
> This would really be useful. However I have couple of aspects to clear
>
> Index Update Gurantee
> 
>
> Lets say if commit succeeds and then we update the index and index
> update fails for some reason. Then would that update be missed or
> there can be some mechanism to recover. I am not very sure about WAL
> here that may be the answer here but still confirming.
>

For ES (I don't know about how the Solr Cloud WAL behaves)
The update be accepted until it's written to the WAL so if something fails
before that, then the it's upto how the queue of updates is managed which
is client side.
If its written to the WAL, whatever happens it will be indexed eventually,
provided the WAL is available. Think of the WAL as equivalent to the Oak
Journal, IIUC. The WAL is present on all replicas, so provided 1 replica is
available on shard, no data is lost.




>
> In Oak with the way async index update works based on checkpoint its
> ensured that index would "eventually" contain the right data and no
> update would be lost. if there is a failure in index update then that
> would fail and next cycle would start again from same base state
>

Sound like the same level of guarantee, depending on how the client side is
implemented. Typically I didnt bother with a queue between the application
and the ES client because the ES client was so fast.


>
> Order of index update
> -
>
> Lets say I have 2 cluster nodes where same node is being performed
>
> Original state /a {x:1}
>
> Cluster Node N1 - /a {x:1, y:2}
> Cluster Node N2 - /a {x:1, z:3}
>
> End State /a {x:1, y:2, z:3}
>
> At Oak level both the commits would succeed as there is no conflict.
> However N1 and N2 would not be seeing each other updates immediately
> and that would depend on background read. So in this case how would
> index update would look like.
>
> 1. Would index update for specific paths go to some master which would
> order the update
>

correct.
Documents are shared by ID so all updates hit the same shard.
That may result in network traffic if the shard is not local.



> 2. Or it would end up with with either of {x:1, y:2} or {x:1, z:3}
>
> Here current async index update logic ensures that it sees the
> eventually expected order of changes and hence would be consistent
> with repository state.


> Backup and Restore
> ---
>
> Would the backup now involve backup of ES index files from each
> cluster node. Or assuming full replication it would involve backup of
> files from any one of the nodes. Would the back be in sync with last
> changes done in repository (assuming sudden shutdown where changes got
> committed to repository but not yet to any index)
>
> Here current approach of storing index files as part of MVCC storage
> ensures that index state is consistent to some "checkpointed" state in
> repository. And post restart it would eventually catch up with the
> current repository state and hence would not require complete rebuild
> of index in case of unclean shutdowns
>

If the revision is present in the document, then I assume it can be
filtered at query time.
However, there may be problems here, as might have to find some way of
indexing the revision history of a document like the format in
MongoDB... I did wonder if a better solution was to use ES as the primary
storage then all the property indexes would be present by default with no
need for any Lucene index plugin. but I stopped thinking about that
with the 1s root document sync as my interest was real time.

Best Regards
Ian


>
>
> Chetan Mehrotra
>


Re: Oak Indexing. Was Re: Property index replacement / evolution

2016-08-11 Thread Chetan Mehrotra
On Thu, Aug 11, 2016 at 3:03 PM, Ian Boston  wrote:
> Both Solr Cloud and ES address this by sharding and
> replicating the indexes, so that all commits are soft, instant and real
> time. That introduces problems.
...
> Both Solr Cloud and ES address this by sharding and
> replicating the indexes, so that all commits are soft, instant and real
> time.

This would really be useful. However I have couple of aspects to clear

Index Update Gurantee


Lets say if commit succeeds and then we update the index and index
update fails for some reason. Then would that update be missed or
there can be some mechanism to recover. I am not very sure about WAL
here that may be the answer here but still confirming.

In Oak with the way async index update works based on checkpoint its
ensured that index would "eventually" contain the right data and no
update would be lost. if there is a failure in index update then that
would fail and next cycle would start again from same base state

Order of index update
-

Lets say I have 2 cluster nodes where same node is being performed

Original state /a {x:1}

Cluster Node N1 - /a {x:1, y:2}
Cluster Node N2 - /a {x:1, z:3}

End State /a {x:1, y:2, z:3}

At Oak level both the commits would succeed as there is no conflict.
However N1 and N2 would not be seeing each other updates immediately
and that would depend on background read. So in this case how would
index update would look like.

1. Would index update for specific paths go to some master which would
order the update
2. Or it would end up with with either of {x:1, y:2} or {x:1, z:3}

Here current async index update logic ensures that it sees the
eventually expected order of changes and hence would be consistent
with repository state.

Backup and Restore
---

Would the backup now involve backup of ES index files from each
cluster node. Or assuming full replication it would involve backup of
files from any one of the nodes. Would the back be in sync with last
changes done in repository (assuming sudden shutdown where changes got
committed to repository but not yet to any index)

Here current approach of storing index files as part of MVCC storage
ensures that index state is consistent to some "checkpointed" state in
repository. And post restart it would eventually catch up with the
current repository state and hence would not require complete rebuild
of index in case of unclean shutdowns


Chetan Mehrotra


Re: Oak Indexing. Was Re: Property index replacement / evolution

2016-08-11 Thread Ian Boston
Hi,

There is no need to have several different plugins to deal with the
standalone, small scale cluster, large scale cluster deployment. It might
be desirable for some reason, but it's not necessary.

I have pushed the code I was working before I got distracted it to a GitHub
repo. [1] is where the co-located ES cluster starts. If the
property es-server-url is defined, an external ES cluster is used.

The repo is wip, incomplete and to will see 2 attempts to port the Lucene
plugin, take2 is the second. As I said I stopped when it became apparent
there was a 1s latency imposed by Oak. I think you enlightened me to that
behavior on oak-dev.

I don't know how to co-locate a Solr Cloud cluster in the same way given it
needs Zookeeper. (I don't know enough about Solr Cloud TBH).
I Oak can't stomach using ES as a library, it could with, with enough time
and resources, re-implement the pattern or something close.

Best Regards
Ian

1
https://github.com/ieb/oak-es/blob/master/src/main/java/org/apache/jackrabbit/oak/plusing/index/es/index/ESServer.java#L27

On 11 August 2016 at 09:58, Chetan Mehrotra 
wrote:

> Couple of points around the motivation, target usecase around Hybrid
> Indexing and Oak indexing in general.
>
> Based on my understanding of various deployments. Any application
> based on Oak has 2 type of query requirements
>
> QR1. Application Query - These mostly involve some property
> restrictions and are invoked by code itself to perform some operation.
> The property involved here in most cases would be sparse i.e. present
> in small subset of whole repository content. Such queries need to be
> very fast and they might be invoked very frequently. Such queries
> should also be more accurate and result should not lag repository
> state much.
>
> QR2. User provided query - These queries would consist of both or
> either of property restriction and fulltext constraints. The target
> nodes may form majority part of overall repository content. Such
> queries need to be fast but given user driven need not be very fast.
>
> Note that speed criteria is very subjective and relative here.
>
> Further Oak needs to support deployments
>
> 1. On single setup - For dev, prod on SegmentNodeStore
> 2. Cluster Setup on premise
> 3. Deployment in some DataCenter
>
> So Oak should enable deployments where for smaller setups it does not
> require any thirdparty system while still allow plugging in a dedicate
> system like ES/Solr if need arises. So both usecases need to be
> supported.
>
> And further even if it has access to such third party server it might
> be fine to rely on embedded Lucene for #QR1 and just delegate queries
> under #QR2 to remote. This would ensure that query results are still
> fast for usage falling under #QR1.
>
> Hybrid Index Usecase
> -
>
> So far for #QR1 we only had property indexes and to an extent Lucene
> based property index where results lag repository state and lag might
> be significant depending on load.
>
> Hybrid index aim to support queries under  #QR1 and can be seen as
> replacement for existing non unique property indexes. Such indexes
> would have lower storage requirement and would not put much load on
> remote storage for execution. Its not meant as a replacement for
> ES/Solr but then intends to address different type of usage
>
> Very large Indexes
> -
>
> For deployments having very large repository Solr or ES based indexes
> would be preferable and there oak-solr can be used (some day oak-es!)
>
> So in brief Oak should be self sufficient for smaller deployment and
> still allow plugging in Solr/ES for large deployment and there also
> provide a choice to admin to configure a sub set of index for such
> usage depending on the size.
>
>
>
>
>
>
> Chetan Mehrotra
>
>
> On Thu, Aug 11, 2016 at 1:59 PM, Ian Boston  wrote:
> > Hi,
> >
> > On 11 August 2016 at 09:14, Michael Marth  wrote:
> >
> >> Hi Ian,
> >>
> >> No worries - good discussion.
> >>
> >> I should point out though that my reply to Davide was based on a
> >> comparison of the current design vs the Jackrabbit 2 design (in which
> >> indexes were stored locally). Maybe I misunderstood Davide’s comment.
> >>
> >> I will split my answer to your mail in 2 parts:
> >>
> >>
> >> >
> >> >Full text extraction should be separated from indexing, as the DS blobs
> >> are
> >> >immutable, so is the full text. There is code to do this in the Oak
> >> >indexer, but it's not used to write to the DS at present. It should be
> >> done
> >> >in a Job, distributed to all nodes, run only once per item. Full text
> >> >extraction is hugely expensive.
> >>
> >> My understanding is that Oak currently:
> >> A) runs full text extraction in a separate thread (separate form the
> >> “other” indexer)
> >> B) runs it only once per cluster
> >> If that is correct then the difference to what you mention above would
> be
> >> that 

Re: Oak Indexing. Was Re: Property index replacement / evolution

2016-08-11 Thread Chetan Mehrotra
Couple of points around the motivation, target usecase around Hybrid
Indexing and Oak indexing in general.

Based on my understanding of various deployments. Any application
based on Oak has 2 type of query requirements

QR1. Application Query - These mostly involve some property
restrictions and are invoked by code itself to perform some operation.
The property involved here in most cases would be sparse i.e. present
in small subset of whole repository content. Such queries need to be
very fast and they might be invoked very frequently. Such queries
should also be more accurate and result should not lag repository
state much.

QR2. User provided query - These queries would consist of both or
either of property restriction and fulltext constraints. The target
nodes may form majority part of overall repository content. Such
queries need to be fast but given user driven need not be very fast.

Note that speed criteria is very subjective and relative here.

Further Oak needs to support deployments

1. On single setup - For dev, prod on SegmentNodeStore
2. Cluster Setup on premise
3. Deployment in some DataCenter

So Oak should enable deployments where for smaller setups it does not
require any thirdparty system while still allow plugging in a dedicate
system like ES/Solr if need arises. So both usecases need to be
supported.

And further even if it has access to such third party server it might
be fine to rely on embedded Lucene for #QR1 and just delegate queries
under #QR2 to remote. This would ensure that query results are still
fast for usage falling under #QR1.

Hybrid Index Usecase
-

So far for #QR1 we only had property indexes and to an extent Lucene
based property index where results lag repository state and lag might
be significant depending on load.

Hybrid index aim to support queries under  #QR1 and can be seen as
replacement for existing non unique property indexes. Such indexes
would have lower storage requirement and would not put much load on
remote storage for execution. Its not meant as a replacement for
ES/Solr but then intends to address different type of usage

Very large Indexes
-

For deployments having very large repository Solr or ES based indexes
would be preferable and there oak-solr can be used (some day oak-es!)

So in brief Oak should be self sufficient for smaller deployment and
still allow plugging in Solr/ES for large deployment and there also
provide a choice to admin to configure a sub set of index for such
usage depending on the size.






Chetan Mehrotra


On Thu, Aug 11, 2016 at 1:59 PM, Ian Boston  wrote:
> Hi,
>
> On 11 August 2016 at 09:14, Michael Marth  wrote:
>
>> Hi Ian,
>>
>> No worries - good discussion.
>>
>> I should point out though that my reply to Davide was based on a
>> comparison of the current design vs the Jackrabbit 2 design (in which
>> indexes were stored locally). Maybe I misunderstood Davide’s comment.
>>
>> I will split my answer to your mail in 2 parts:
>>
>>
>> >
>> >Full text extraction should be separated from indexing, as the DS blobs
>> are
>> >immutable, so is the full text. There is code to do this in the Oak
>> >indexer, but it's not used to write to the DS at present. It should be
>> done
>> >in a Job, distributed to all nodes, run only once per item. Full text
>> >extraction is hugely expensive.
>>
>> My understanding is that Oak currently:
>> A) runs full text extraction in a separate thread (separate form the
>> “other” indexer)
>> B) runs it only once per cluster
>> If that is correct then the difference to what you mention above would be
>> that you would like the FT indexing not be pinned to one instance but
>> rather be distributed, say round-robin.
>> Right?
>>
>
>
> Yes.
>
>
>>
>>
>> >Building the same index on every node doesn't scale for the reasons you
>> >point out, and eventually hits a brick wall.
>> >http://lucene.apache.org/core/6_1_0/core/org/apache/
>> lucene/codecs/lucene60/package-summary.html#Limitations.
>> >(Int32 on Document ID per index). One of the reasons for the Hybrid
>> >approach was the number of Oak documents in some repositories will exceed
>> >that limit.
>>
>> I am not sure what you are arguing for with this comment…
>> It sounds like an argument in favour of the current design - which is
>> probably not what you mean… Could you explain, please?
>>
>
> I didn't communicate that very well.
>
> Currently Lucene (6.1) has a limit of Int32 to the number of documents it
> can store in an index, IIUC There is a long term desire to increase that
> but using Int64 but no long term commitment as its probably significant
> work given arrays in Java are indexed with Int32.
>
> The Hybrid approach doesn't help the potential Lucene brick wall, but one
> motivation for looking at it was the number of Oak Documents including
> those under /oak:index which is, in some cases, approaching that limit.
>
>
>
>>
>>
>> Thanks!
>> Michael

Re: Property index replacement / evolution

2016-08-09 Thread Ian Boston
Hi,

On 8 August 2016 at 15:39, Vikas Saurabh  wrote:

> Hi Ian,
>
> On Mon, Aug 8, 2016 at 3:41 PM, Ian Boston  wrote:
> >
> > If every successful commit writes the root node, due to every update
> > updating a sync prop index, this leaves me wondering how the delayed sync
> > reduces the writes to the root node ?
> >
> > I thought the justification of the 1s sync operation was to reduce the
> > writes to the root node to n/s where n is the number of instances in the
> > cluster, however based on what you are telling me the rate is (m+n)/s
> where
> > m is the total commits per second of the whole cluster. I understand that
> > the update to test for a conflicted commit may not be the same as the
> > update of _lastRevs, but in MongoDB both update the MongoDB document.
> >
>
> I'm not sure of the exact numbers around how MongoDB would perform for
> lots of edits to the same document. There's a bit of difference
> between _lastRev write and commit-root conditional update -
> commit-root update is a change on a sub-document... so, something like
> 'set "_revision.rX"="c" on _id=0:/ iff "_conflict.rX"' doesn't exist.
> While last rev updates change the same key across commits from the
> same cluster node - something like 'set "_lastRevs.r0-0-X"="rY-0-X" '.
> I think the idea is to avoid any conflict on MongoDB's update
> statements. I'm not sure if such edits (edits to same doc but at a
> different sub-doc/key) degrade performance badly.
>

You are correct, that a conditional update won't cost as much as a non
conditional update, if no write is performed. And if no write is performed
neither is replication, so cost is low, however, AFAIK, a MongoDB document
is a single document stored against a single _id key. _conflict.rX and
_lastRevs are all part of the same BSON object.

So every write, even conditionally to a sub document, will make the root
document hot, and since MongoDB shards on _id, that makes 1 MongoDB shard
hot. Every Oak commit will result in an update op to the MongoDB primary
holding the root document. This isnt specific to MongoMK, it probably
impacts all DocumentMK implementations.

OAK-4638 and OAK-4412 will need to eliminate all sync property indexes to
change this behaviour. (item 3 in the start of the thread)

Alternatively, move the indexes so that a sync property index update
doesn't perform a conditional change to the  global root document ? ( A new
thread would be required to discuss this if worth talking about.)


> Thanks,
> Vikas
> PS: I wonder if we should open a different thread as it seems to be
> digressing from the subject :)
>

I'll try not to digress.

Best Regards
Ian


Re: Property index replacement / evolution

2016-08-08 Thread Vikas Saurabh
Hi Ian,

On Mon, Aug 8, 2016 at 3:41 PM, Ian Boston  wrote:
>
> If every successful commit writes the root node, due to every update
> updating a sync prop index, this leaves me wondering how the delayed sync
> reduces the writes to the root node ?
>
> I thought the justification of the 1s sync operation was to reduce the
> writes to the root node to n/s where n is the number of instances in the
> cluster, however based on what you are telling me the rate is (m+n)/s where
> m is the total commits per second of the whole cluster. I understand that
> the update to test for a conflicted commit may not be the same as the
> update of _lastRevs, but in MongoDB both update the MongoDB document.
>

I'm not sure of the exact numbers around how MongoDB would perform for
lots of edits to the same document. There's a bit of difference
between _lastRev write and commit-root conditional update -
commit-root update is a change on a sub-document... so, something like
'set "_revision.rX"="c" on _id=0:/ iff "_conflict.rX"' doesn't exist.
While last rev updates change the same key across commits from the
same cluster node - something like 'set "_lastRevs.r0-0-X"="rY-0-X" '.
I think the idea is to avoid any conflict on MongoDB's update
statements. I'm not sure if such edits (edits to same doc but at a
different sub-doc/key) degrade performance badly.

Thanks,
Vikas
PS: I wonder if we should open a different thread as it seems to be
digressing from the subject :)


Re: Property index replacement / evolution

2016-08-08 Thread Ian Boston
Hi Vikas,

On 8 August 2016 at 14:13, Vikas Saurabh  wrote:

> Hi Ian,
>
> On Sun, Aug 7, 2016 at 10:01 AM, Ian Boston  wrote:
> > Also, IIRC, the root document is not persisted on every commit, but
> > synchronized periodically (once every second) similar to fsync on a disk.
> > So the indexes (in fact all Oak Documents) are synchronous on the local
> Oak
> > instance and are synchronous on remote Oak instances but with a minimum
> > data latency of the root document sync rate (1s). IIUC the 1 second sync
> > period is a performance optimisation as the root document must be updated
> > by every commit and hence is a global singleton in an Oak cluster, and
> > already hot as you point out in 3.
> >
>
> Just to clarify a bit. There are potentially 2 updates that can modify
> root document.
> With every commit, oak (document mk) defines a document to be
> commit-root. That's root of the sub-tree which changes. A commit is
> successful if commit-root could be conditionally updated (condition to
> see if the commit conflicted with something else or not). With
> synchronous prop indices, commit root usually is at root - so each
> successful commit would write to root. That's what Michael was
> pointing to in point3.
> The other update is about asynchronous update of _lastRevs - _lastRevs
> control visibility horizon. For local nodes, a pending list of updates
> is kept in memory so local sessions/builders get to see committed
> changes. These are pushed to persistence mongo during background
> update which defaults at 1 s interval. So, other cluster nodes don't
> see changes immediately.
>

Thanks for the explanation. I learnt something more.

If every successful commit writes the root node, due to every update
updating a sync prop index, this leaves me wondering how the delayed sync
reduces the writes to the root node ?

I thought the justification of the 1s sync operation was to reduce the
writes to the root node to n/s where n is the number of instances in the
cluster, however based on what you are telling me the rate is (m+n)/s where
m is the total commits per second of the whole cluster. I understand that
the update to test for a conflicted commit may not be the same as the
update of _lastRevs, but in MongoDB both update the MongoDB document.

Best Regards
Ian


>
> Thanks,
> Vikas
>


Re: Property index replacement / evolution

2016-08-08 Thread Vikas Saurabh
Hi Ian,

On Sun, Aug 7, 2016 at 10:01 AM, Ian Boston  wrote:
> Also, IIRC, the root document is not persisted on every commit, but
> synchronized periodically (once every second) similar to fsync on a disk.
> So the indexes (in fact all Oak Documents) are synchronous on the local Oak
> instance and are synchronous on remote Oak instances but with a minimum
> data latency of the root document sync rate (1s). IIUC the 1 second sync
> period is a performance optimisation as the root document must be updated
> by every commit and hence is a global singleton in an Oak cluster, and
> already hot as you point out in 3.
>

Just to clarify a bit. There are potentially 2 updates that can modify
root document.
With every commit, oak (document mk) defines a document to be
commit-root. That's root of the sub-tree which changes. A commit is
successful if commit-root could be conditionally updated (condition to
see if the commit conflicted with something else or not). With
synchronous prop indices, commit root usually is at root - so each
successful commit would write to root. That's what Michael was
pointing to in point3.
The other update is about asynchronous update of _lastRevs - _lastRevs
control visibility horizon. For local nodes, a pending list of updates
is kept in memory so local sessions/builders get to see committed
changes. These are pushed to persistence mongo during background
update which defaults at 1 s interval. So, other cluster nodes don't
see changes immediately.

Thanks,
Vikas


Re: Property index replacement / evolution

2016-08-07 Thread Ian Boston
Hi,
For TarMK, none of this is an issue as TarMK is all in memory on 1 JVM with
local disk. Scaling up by throwing RAM and IO at the problem is a viable
option, as far as it's safe/sensible to do so. But TarMK doesn't cluster,
and if it did cluster, this would probably be an issue.

I think, but could easily be wrong, that in the case of MongoDB all
modifications to indexes generated by a commit are persisted in a single
batch request taking (ie 1 mongodb statement). The time take to process
that request is dependent on the size of the request. Large requests can
take seconds on large databases. Its not the distance between Oak and the
database that matters, as only 1 mongodb statement us used, its the
processing time of that statement in MongoDB that matters. With MongoDB
setup correctly to not loose data, this statement must be written to a
majority of replicas before processing can continue. MongoDB replication is
sequential.

Also, IIRC, the root document is not persisted on every commit, but
synchronized periodically (once every second) similar to fsync on a disk.
So the indexes (in fact all Oak Documents) are synchronous on the local Oak
instance and are synchronous on remote Oak instances but with a minimum
data latency of the root document sync rate (1s). IIUC the 1 second sync
period is a performance optimisation as the root document must be updated
by every commit and hence is a global singleton in an Oak cluster, and
already hot as you point out in 3.

I have been involved on the periphery of OAK-4638 and OAK-4412. For me, the
main benefit is reducing the number of documents stored in the database.
While it is true that the number of documents stored in the database
doesn't matter for small numbers, with every document being counted inside
Oak, and every document having an impact database performance, having
around 66% of the documents not contributing to repository content storage
reduces the ultimate capacity limit of an Oak repository by the same
amount. 2/3rds. With many applications being built on top of Oak exploiting
the deep content structure that Oak encourages and makes so easy, this
limit rapidly becomes a reality. What limit ? A limit at which one of the
components ceases to work. I don't know which one and when but it's there.
A repository containing 100M content items may need 1E10 documents due to
both the application implementation and synchronous indexing. Perhaps the
application should fix itself, but so should Oak.

Quite apart from all that, is embarrassingly wasteful to be using Oak
documents in this way for non TarMK repos, rather like implementing Lucene
in SQL.

To recap.
Addressing 1 and 2 are a requirement to reduce waste, increase performance
of the update operations and increase data scalability.
3 is not an issue, the pressure is already there without any indexes. Every
write has to update the root document for that update to become visible, by
design.

I am not a core Oak developer, just an observer, so if I got anything
wrong, please someone correct me and I will learn from the experience.

Best Regards
Ian







On 5 August 2016 at 18:04, Michael Marth  wrote:

> Hi,
>
> I have noticed OAK-4638 and OAK-4412 – which both deal with particular
> problematic aspects of property indexes. I realise that both issues deal
> with slightly different problems and hence come to different suggested
> solutions.
> But still I felt it would be good to take a holistic view on the different
> problems with property indexes. Maybe there is a unified approach we can
> take.
>
> To my knowledge there are 3 areas where property indexes are problematic
> or not ideal:
>
> 1. Number of nodes: Property indexes can create a large number of nodes.
> For properties that are very common the number of index nodes can be almost
> as large as the number of the content nodes. A large number of nodes is not
> necessarily a problem in itself, but if the underlying persistence is e.g.
> MongoDB then those index nodes (i.e. MongoDB documents) cause pressure on
> MongoDB’s mmap architecture which in turn affects reading content nodes.
>
> 2. Write performance: when the persistence (i.e. MongoDB) and Oak are “far
> away from each other” (i.e. high network latency or low throughput) then
> synchronous property indexes affect the write throughput as they may cause
> the payload to double in size.
>
> 3. I have no data on this one – but think it might be a topic: property
> index updates usually cause commits to have / as the commit root. This
> results on pressure on the root document.
>
> Please correct me if I got anything wrong  or inaccurate in the above.
>
> My point is, however, that at the very least we should have clarity which
> one go the items above we intend to tackle with Oak improvements. Ideally
> we would have a unified approach.
> (I realize that property indexes come in various flavours like unique
> index or not, which makes the discussion more complex)
>
> my2c
> 

Property index replacement / evolution

2016-08-05 Thread Michael Marth
Hi,

I have noticed OAK-4638 and OAK-4412 – which both deal with particular 
problematic aspects of property indexes. I realise that both issues deal with 
slightly different problems and hence come to different suggested solutions.
But still I felt it would be good to take a holistic view on the different 
problems with property indexes. Maybe there is a unified approach we can take.

To my knowledge there are 3 areas where property indexes are problematic or not 
ideal:

1. Number of nodes: Property indexes can create a large number of nodes. For 
properties that are very common the number of index nodes can be almost as 
large as the number of the content nodes. A large number of nodes is not 
necessarily a problem in itself, but if the underlying persistence is e.g. 
MongoDB then those index nodes (i.e. MongoDB documents) cause pressure on 
MongoDB’s mmap architecture which in turn affects reading content nodes.

2. Write performance: when the persistence (i.e. MongoDB) and Oak are “far away 
from each other” (i.e. high network latency or low throughput) then synchronous 
property indexes affect the write throughput as they may cause the payload to 
double in size.

3. I have no data on this one – but think it might be a topic: property index 
updates usually cause commits to have / as the commit root. This results on 
pressure on the root document.

Please correct me if I got anything wrong  or inaccurate in the above.

My point is, however, that at the very least we should have clarity which one 
go the items above we intend to tackle with Oak improvements. Ideally we would 
have a unified approach.
(I realize that property indexes come in various flavours like unique index or 
not, which makes the discussion more complex)

my2c
Michael