[jira] [Commented] (OAK-4412) Lucene hybrid index

Thomas Mueller (JIRA) Fri, 09 Sep 2016 02:20:49 -0700

    [ 
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15476437#comment-15476437
 ]


Thomas Mueller commented on OAK-4412:
-------------------------------------

Some comments for the patch. 

* Many new tests, that's great!
* What is the easiest way to have a "collapsed" patch (compare trunk against 
latest version of your patch)?
* Large and complex change, so the risk of bugs is still high (even with many 
new tests).
* Because the patch is large, it's hard to review. But reviewing intermediate 
steps is probably even more work.
* Possible impact on concurrency by adding "synchronized" for createReader
* It looks like unrelated issues are touched, for example lazy init of 
"FacetsConfig facetsConfig". Would be better done in separate issues.
* To reduce impact on existing code, it would be good if the new feature is 
"modular", so if it is not used, the risk is low. Not sure how to best do that, 
but usually adding abstraction layers helps.
* //TODO [hybrid] ... not sure if all of them are resolved?
* Basic code conventions not follows (as always with code you wrote), for 
example sometimes missing space before "{".

Performance: looks good to me.


> Lucene hybrid index
> -------------------
>
>                 Key: OAK-4412
>                 URL: https://issues.apache.org/jira/browse/OAK-4412
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: lucene
>            Reporter: Tomek Rękawek
>            Assignee: Chetan Mehrotra
>             Fix For: 1.6
>
>         Attachments: OAK-4412-v1.diff, OAK-4412.patch, hybrid-benchmark.sh, 
> hybrid-result-v1.txt
>
>
> When running Oak in a cluster, each write operation is expensive. After 
> performing some stress-tests with a geo-distributed Mongo cluster, we've 
> found out that updating property indexes is a large part of the overall 
> traffic.
> The asynchronous index would be an answer here (as the index update won't be 
> made in the client request thread), but the AEM requires the updates to be 
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a 
> synchronous, locally-stored counterpart that will persist only the data since 
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local 
> files. Once the "main" Lucene index is being updated, the local index will be 
> purged.
> Queries will use an union of results from the {{lucene}} and 
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using 
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the 
> OAK-4233.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OAK-4412) Lucene hybrid index

Reply via email to