Hi, As part of OAK-1702 I have added a benchmark to compare the performance of Full text query search with JR2
Based on approach taken (which might be wrong) I get following numbers Apache Jackrabbit Oak 0.21.0-SNAPSHOT # FullTextSearchTest C min 10% 50% 90% max N Oak-Mongo 1 58 71 101 119 287 610 Oak-Mongo-FDS 1 50 51 52 58 184 1106 Oak-Tar 1 39 40 40 44 64 1459 Oak-Tar-FDS 1 53 54 55 64 197 1030 Jackrabbit 1 4 4 5 6 231 11385 Which shows that JR2 performs lot better for full text queries and subsequent queries are quite faster once Lucene has warmed up. Looking at current usage of Lucene in Oak and the way we store and access the Lucene indexes [2] I have couple of doubts 1. Multiple IndexSearcher instances - Current impl would create a new IndexSearcher for every Lucene query as the OakDirectory uses is bound to NodeState of executing JCR session. Compared to this in JR2 we probably had a singleton IndexSearcher which was shared across all the query execution path. This would potentially cause performance issue as Lucene is effectively used in a state less way and it has to perform initialization for every call. As [3] the IndexSearcher must be shared 2. Index Access - Currently we have custom OakDirectory which provides access to Lucene indexes stored in NodeStore. Even with SegmentStore which has memory mapped file the random access used by Lucene would probably be lot slower with OakDirectory in comparison to default Lucene MMapDirectory. For small setups where Lucene index can be accomodated on each node I think it would be better if the index is access from file system Are the above concerns valid and should we relook into how we are using Lucene in Oak? Chetan Mehrotra [1] https://issues.apache.org/jira/browse/OAK-1702 [2] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/OakDirectory.java [3] http://wiki.apache.org/lucene-java/ImproveSearchingSpeed