[
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15474384#comment-15474384
]
Chetan Mehrotra commented on OAK-4412:
--------------------------------------
Below are the results from the benchmark runs
{noformat}
Apache Jackrabbit Oak 1.6-SNAPSHOT
# HybridIndexTest C min 10% 50% 90% max
N Searcher Mutator Indexed
Oak-Segment-Tar-FDS 1 14 25 40 86 357
1206 1216185 8374 7434 #property
Oak-Segment-Tar-FDS 1 10 21 35 66 870
1356 1853336 10587 8400 #nrt
Oak-Segment-Tar-FDS 1 25 46 74 133 1227
633 1670741 4350 3966 #sync
Oak-Segment-Tar-FDS 5 15 73 117 187 542
2363 1586849 3344 14406 #property
Oak-Segment-Tar-FDS 5 10 65 102 170 940
2480 1800942 3952 15168 #nrt
Oak-Segment-Tar-FDS 5 67 144 219 382 2011
1127 1867438 1648 6894 #sync
{noformat}
For Mongo
{noformat}
Apache Jackrabbit Oak 1.6-SNAPSHOT
# HybridIndexTest C min 10% 50% 90% max
N Searcher Mutator Indexed
Oak-Mongo-FDS 1 60 87 151 260 542
365 357941 2585 2286 #property
Oak-Mongo-FDS 1 44 60 116 209 950
428 931318 4727 2724 #nrt
Oak-Mongo-FDS 1 49 105 163 267 1456
301 942431 2773 1908 #sync
Oak-Mongo-FDS 5 142 227 365 602 1040
763 43238 1149 4668 #property
Oak-Mongo-FDS 5 120 177 254 414 1218
1036 354890 2451 6360 #nrt
Oak-Mongo-FDS 5 152 256 346 552 1940
731 545089 1352 4488 #sync
{noformat}
The test can be executed with various permutations via [this
script|^hybrid-benchmark.sh] and complete stats can be seen
[here|^hybrid-result-v1.txt].
# "sync" index perform badly compared to "property" index for Segment. But
perform similar for Mongo setups
# "sync" index for Mongo perform very good for query side
# "nrt" index perform better for all setups. Here the commits to index are done
in async manner with refresh delay of 1 sec. While running benchmark the queue
size remained below < 15 in all cases. So here it can be safely said that
## any change would get reflected in query result with max 1 sec delay
## Overhead of running LuceneIndexEditor is lower. So it can be safely enabled.
It might happen that with complex index (involving relative properties)
overhead might be bit higher but then there we should not see that large volume
of edits. So it needs to be benchmarked in real world scenarios
> Lucene hybrid index
> -------------------
>
> Key: OAK-4412
> URL: https://issues.apache.org/jira/browse/OAK-4412
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: lucene
> Reporter: Tomek Rękawek
> Assignee: Chetan Mehrotra
> Fix For: 1.6
>
> Attachments: OAK-4412-v1.diff, OAK-4412.patch, hybrid-benchmark.sh,
> hybrid-result-v1.txt
>
>
> When running Oak in a cluster, each write operation is expensive. After
> performing some stress-tests with a geo-distributed Mongo cluster, we've
> found out that updating property indexes is a large part of the overall
> traffic.
> The asynchronous index would be an answer here (as the index update won't be
> made in the client request thread), but the AEM requires the updates to be
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a
> synchronous, locally-stored counterpart that will persist only the data since
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local
> files. Once the "main" Lucene index is being updated, the local index will be
> purged.
> Queries will use an union of results from the {{lucene}} and
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the
> OAK-4233.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)