[jira] [Commented] (OAK-4412) Lucene hybrid index

Ian Boston (JIRA) Mon, 27 Jun 2016 04:47:44 -0700

    [ 
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350844#comment-15350844
 ]


Ian Boston commented on OAK-4412:
---------------------------------

bq.  Note that in Oak the target usecase is not a generic search server.

Agreed. All the techniques described are appropriate to any application using 
Lucene that requires low data latency and scalability, not just a generic 
search server. ElasticSearch, for instance, is frequently for metrics 
aggregation and much less frequently used as a generic search server.

bq. By design code running on specific cluster node can only see changes as per 
its head latest revision. So there is no notion of "cluster consistent index" 
for very recent changes.

Does that imply that, regardless of the indexing mechanism used, indexes will 
always have a data latency of a best the rate of commit of the root node in the 
repository (IIUC each instances currently syncs its copy of the root node once 
a second, since the root node is a cluster singleton), and indexing will always 
have to be tied to an Oak revision ?

If there is a hard requirement that indexes are tied to the root node revision, 
then there seems to be no point in following the work done by others in 
lowering the data latency, increasing throughput and scalability. (ie ignore my 
previous observations and sorry for the distraction).





> Lucene hybrid index
> -------------------
>
>                 Key: OAK-4412
>                 URL: https://issues.apache.org/jira/browse/OAK-4412
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Tomek Rękawek
>            Assignee: Tomek Rękawek
>             Fix For: 1.6
>
>         Attachments: OAK-4412.patch
>
>
> When running Oak in a cluster, each write operation is expensive. After 
> performing some stress-tests with a geo-distributed Mongo cluster, we've 
> found out that updating property indexes is a large part of the overall 
> traffic.
> The asynchronous index would be an answer here (as the index update won't be 
> made in the client request thread), but the AEM requires the updates to be 
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a 
> synchronous, locally-stored counterpart that will persist only the data since 
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local 
> files. Once the "main" Lucene index is being updated, the local index will be 
> purged.
> Queries will use an union of results from the {{lucene}} and 
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using 
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the 
> OAK-4233.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OAK-4412) Lucene hybrid index

Reply via email to