[ 
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350812#comment-15350812
 ] 

Chetan Mehrotra commented on OAK-4412:
--------------------------------------

Thanks Ian for the details here! Very useful and would go through the talk. 
Note that in Oak the target usecase is not a generic search server. 

By design code running on specific cluster node can only see changes as per its 
head latest revision. So there is no notion of "cluster consistent index" for 
very recent changes. 

bq. 1. No increase in index update throughput or reduction in data latency for 
a cluster consistent index. Still single threaded cluster sequential.
Ack.

bq. 2. Reduction in data latency for changes isolated to one node, mitigated by 
sticky http sessions, which are/were considered an indication of a non scalable 
platform.

Yes in general Sticky session hamper scalability. However given design of Oak 
and how it gets used by application i.e. where application code is colocated 
with data and application processed itself being stateful this would be 
required. 

In normal case where applications are stateless you can connect to singe source 
of truth and hence changes done by one node would be visible to others and thus 
avoiding the need for stickiness. This is not the case with Oak based 
applications and hence stickiness becomes necessary

Also note that most of the current property indexes are used by application 
code and not doing user provided search i.e. search query is coded in 
application.

bq. 5. Added complexity resulting in lower reliability.

Indeed. Supporting this makes implementation more complex!

For #3 and #4 again even without this change and say with direct traversal (no 
query) application code would see different things based on session revision. 
So application built on top of Oak/JCR already are dealing with this behaviour. 
This feature somewhat tries to bring Lucene index to level of Property index in 
terms of data latency for same cluster node.. If this works as expected we can 
remove quite a few property indexes (leaving unique ones only) and save on node 
storage.

> Lucene hybrid index
> -------------------
>
>                 Key: OAK-4412
>                 URL: https://issues.apache.org/jira/browse/OAK-4412
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Tomek Rękawek
>            Assignee: Tomek Rękawek
>             Fix For: 1.6
>
>         Attachments: OAK-4412.patch
>
>
> When running Oak in a cluster, each write operation is expensive. After 
> performing some stress-tests with a geo-distributed Mongo cluster, we've 
> found out that updating property indexes is a large part of the overall 
> traffic.
> The asynchronous index would be an answer here (as the index update won't be 
> made in the client request thread), but the AEM requires the updates to be 
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a 
> synchronous, locally-stored counterpart that will persist only the data since 
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local 
> files. Once the "main" Lucene index is being updated, the local index will be 
> purged.
> Queries will use an union of results from the {{lucene}} and 
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using 
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the 
> OAK-4233.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to