[
https://issues.apache.org/jira/browse/OAK-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chetan Mehrotra updated OAK-1702:
---------------------------------
Attachment: OAK-1702-shared-indexer-2.patch
Updated [patch|^OAK-1702-shared-indexer-2.patch] which relies on NodeState.
With this following performance no are observed
* The test now fetches first 100 rows
* Multi threaded runs are executed
Legend
* Shared searcher - Benchmark run with just the attached patch applied. It
includes support for shared searcher and batch querying. Results are loaded in
batch of 100
* disable compression - Compression disabled via custom OakCodec
* mlt off - All text content is not stored as part of index
* local dir - In this the Lucene index is first copied to local file system and
then a FSDirectory is opened on it. This feature is optional can be enabled via
configuration
I still need to add testcase for logic in SearcherManager. However would like a
review of the approach taken.
Key observations
* Default Oak-Tar work fine with usage of OakDirectory in querying
* When FileDataStore (FDS) is used then using a native FSDirectory performs
better.
* If MLT is disabled then we need not go for disabling the compression. I think
key issue was that we were storing all the content which slows down reading
path value as noted by Alex in previous comment
* SearcherManager - Current approach relies on a time delaye between subsequent
calls to check for changes in directory. If a change is detected (via
comparison of Lucene segment versions) then a new searcher is opened
{noformat}
//With read limit set to 100
# FullTextSearchTest C min 10% 50% 90% max
N
Jackrabbit 1 4 4 5 6 71
12224
Jackrabbit 5 0 0 1 1 161
331510
Jackrabbit 10 0 0 1 10 254
174780
//shared searcher
# FullTextSearchTest C min 10% 50% 90% max
N
Oak-Tar 1 6 6 7 7 42
8728
Oak-Tar 5 1 1 2 7 68
90120
Total read 1593592
//shared searcher/disable compression
# FullTextSearchTest C min 10% 50% 90% max
N
Oak-Tar 1 3 4 5 6 48
12412
Oak-Tar 5 0 1 2 5 172
99472
//shared searcher/mlt off
# FullTextSearchTest C min 10% 50% 90% max
N
Oak-Tar 1 3 3 4 4 15
16616
Oak-Tar 5 1 1 2 5 55
106068
Total read 3498539
//shared searcher/mlt off/disable compression
# FullTextSearchTest C min 10% 50% 90% max
N
Oak-Tar 1 2 3 3 5 22
16287
Oak-Tar 5 0 1 2 5 58
109836
Total read 3827996
//shared searcher/mlt off/disable compression/local dir
# FullTextSearchTest C min 10% 50% 90% max
N
Oak-Tar 1 1 2 2 3 61
27018
Oak-Tar 5 0 0 1 1 82
304053
Total read 7142948
Oak-Tar-FDS 1 1 2 2 4 90
24198
Oak-Tar-FDS 5 0 0 1 2 133
229287
Total read 13134162
//shared searcher/local dir
# FullTextSearchTest C min 10% 50% 90% max
N
Oak-Tar 1 5 6 6 11 76
7656
Oak-Tar 5 0 0 1 2 231
226866
Total read 1822340
Oak-Tar-FDS 1 5 5 6 6 51
10163
Oak-Tar-FDS 5 0 0 1 2 128
228108
Total read 3948533
{noformat}
[~tmueller], [~alexparvulescu], [~jukkaz], [~teofili] Can you review the patch
> Create a benchmark for Full text search
> ---------------------------------------
>
> Key: OAK-1702
> URL: https://issues.apache.org/jira/browse/OAK-1702
> Project: Jackrabbit Oak
> Issue Type: Task
> Components: bench
> Reporter: Chetan Mehrotra
> Assignee: Chetan Mehrotra
> Fix For: 1.1
>
> Attachments: OAK-1702-hack.patch, OAK-1702-lazy-cursor.patch,
> OAK-1702-shared-indexer-2.patch, OAK-1702-shared-indexer.patch,
> OAK-1702.oakcodec.patch, OAK-1702.patch
>
>
> To compare the performance of Full text search between Jackrabbit 2 and Oak a
> benchmark should be added.
> To start with the benchmark would do following
> * Would be based on WikipediaImport benchmark. So it would import the
> wikipedia dump and perform full text query on that
> * Should be able to run on both JR2 and Oak. Need to account for maven setup
> to handle different Lucene version as JR2 uses 3.6.0 and Oak use 4.x
> Later we can add concurrent version
--
This message was sent by Atlassian JIRA
(v6.2#6252)