[jira] [Commented] (OAK-4412) Lucene hybrid index

Chetan Mehrotra (JIRA) Thu, 08 Sep 2016 09:50:06 -0700

    [ 
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15474384#comment-15474384
 ]


Chetan Mehrotra commented on OAK-4412:
--------------------------------------

Below are the results from the benchmark runs

{noformat}
Apache Jackrabbit Oak 1.6-SNAPSHOT
# HybridIndexTest                  C     min     10%     50%     90%     max    
   N Searcher  Mutator  Indexed
Oak-Segment-Tar-FDS                1      14      25      40      86     357    
1206 1216185      8374      7434      #property
Oak-Segment-Tar-FDS                1      10      21      35      66     870    
1356 1853336     10587      8400      #nrt
Oak-Segment-Tar-FDS                1      25      46      74     133    1227    
 633 1670741      4350      3966      #sync

Oak-Segment-Tar-FDS                5      15      73     117     187     542    
2363 1586849      3344     14406      #property
Oak-Segment-Tar-FDS                5      10      65     102     170     940    
2480 1800942      3952     15168      #nrt
Oak-Segment-Tar-FDS                5      67     144     219     382    2011    
1127 1867438      1648      6894      #sync
{noformat}

For Mongo
{noformat}
Apache Jackrabbit Oak 1.6-SNAPSHOT
# HybridIndexTest                  C     min     10%     50%     90%     max    
   N Searcher  Mutator  Indexed
Oak-Mongo-FDS                      1      60      87     151     260     542    
 365  357941      2585      2286      #property
Oak-Mongo-FDS                      1      44      60     116     209     950    
 428  931318      4727      2724      #nrt
Oak-Mongo-FDS                      1      49     105     163     267    1456    
 301  942431      2773      1908      #sync

Oak-Mongo-FDS                      5     142     227     365     602    1040    
 763   43238      1149      4668      #property
Oak-Mongo-FDS                      5     120     177     254     414    1218    
1036  354890      2451      6360      #nrt
Oak-Mongo-FDS                      5     152     256     346     552    1940    
 731  545089      1352      4488      #sync
{noformat}

The test can be executed with various permutations via [this 
script|^hybrid-benchmark.sh] and complete stats can be seen  
[here|^hybrid-result-v1.txt]. 

# "sync" index perform badly compared to "property" index for Segment. But 
perform similar for Mongo setups
# "sync" index for Mongo perform very good for query side
# "nrt" index perform better for all setups. Here the commits to index are done 
in async manner with refresh delay of 1 sec. While running benchmark the queue 
size remained below < 15 in all cases. So here it can be safely said that 
## any change would get reflected in query result with max 1 sec delay
## Overhead of running LuceneIndexEditor is lower. So it can be safely enabled. 
It might happen that with complex index (involving relative properties) 
overhead might be bit higher but then there we should not see that large volume 
of edits. So it needs to be benchmarked in real world scenarios



> Lucene hybrid index
> -------------------
>
>                 Key: OAK-4412
>                 URL: https://issues.apache.org/jira/browse/OAK-4412
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: lucene
>            Reporter: Tomek Rękawek
>            Assignee: Chetan Mehrotra
>             Fix For: 1.6
>
>         Attachments: OAK-4412-v1.diff, OAK-4412.patch, hybrid-benchmark.sh, 
> hybrid-result-v1.txt
>
>
> When running Oak in a cluster, each write operation is expensive. After 
> performing some stress-tests with a geo-distributed Mongo cluster, we've 
> found out that updating property indexes is a large part of the overall 
> traffic.
> The asynchronous index would be an answer here (as the index update won't be 
> made in the client request thread), but the AEM requires the updates to be 
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a 
> synchronous, locally-stored counterpart that will persist only the data since 
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local 
> files. Once the "main" Lucene index is being updated, the local index will be 
> purged.
> Queries will use an union of results from the {{lucene}} and 
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using 
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the 
> OAK-4233.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OAK-4412) Lucene hybrid index

Reply via email to