Hi, I agree with the idea to find a way to share the readers across threads.
Looking at the proposed patch I see a few problems: - the patch assumes that there is and will be a single lucene index directly under the root node, which may not necessarily be the case. I agree this assumption holds now, but I would not introduce any changes that take away this flexibility. - browsing through I notice that this only helps with concurrent threads, the call searcherManager.release translates into a decRef which means the readers will be closed if I'm not mistaken. This might explain the only marginal gain in perf. We should be looking for a more general optimization where we might leverage the fact that the index can be updated only each 5 seconds. I was thinking that we can use the initial NodeState from the index content node as a way to tell if it changed or not (using equals calls). It would work in the following way: first call, no state in the searchManager, take the provided NodeState (again random node state, the index could be on any node of the repo), build an index reader based on this, reuse it from how many threads you need. Cache this under path/NodeSate/IndexReader. On each subsequent call we can use the provided NodeState to check if the cache is stale or not: path + NodeState.equals. The biggest problem I see here is resource cleanup, as we'll not call decRef on each search call, we need a way to get notified when the application shuts down. Similar to Chetan's patch we can use a combo of 'Closeable' and '@Deactivate' but I'm not sure that will be enough outside OSGi. Take this with a grain of salt, I probably missed some aspects of the problem along the way. best, alex On Wed, Apr 9, 2014 at 10:43 AM, Chetan Mehrotra <[email protected]>wrote: > On Wed, Apr 9, 2014 at 12:25 PM, Marcel Reutegger <[email protected]> > wrote: > >> Since the Lucene index is in any case updated asynchronously, it > >> should be fine for us to ignore the base NodeState of the current > >> session and instead use an IndexSearcher based on the last state as > >> updated by the async indexer. This would allow us to reuse the > >> IndexSearcher over multiple queries. > > > > I was also wondering if it makes sense to share it across multiple > > sessions performing a query to reduce the number of index readers > > that may be open at the same time. however, this will likely also > > reduce concurrency because we synchronize access to a single > > session. > > I tried with one approach where I used a custom SerahcerManager based > on Lucene SearcherManager. It obtains the root NodeState directly from > NodeStore. As NodeStore can be accessed concurrently it should not > have any impact on session concurrency > > With this change there is a slight improvement > > Oak-Tar 1 39 40 40 44 > 64 1459 > Oak-Tar(Shared) 1 32 33 34 36 > 61 1738 > > So did not gave much boost (at least with approach taken). As I do not > have much understanding of Lucene internal can someone review the > approach taken and see if there are some major issues with it > > > Chetan Mehrotra > [1] > https://issues.apache.org/jira/secure/attachment/12639366/OAK-1702-shared-indexer.patch > [2] > https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/SearcherManager.html >
