Re: Slow full text query performance and Lucene Index handling in Oak

Alex Parvulescu Wed, 09 Apr 2014 02:31:26 -0700

Hi,

I agree with the idea to find a way to share the readers across threads.

Looking at the proposed patch I see a few problems:

 - the patch assumes that there is and will be a single lucene index
directly under the root node, which may not necessarily be the case. I
agree this assumption holds now, but I would not introduce any changes that
take away this flexibility.

 - browsing through I notice that this only helps with concurrent threads,
the call searcherManager.release translates into a decRef which means the
readers will be closed if I'm not mistaken.
This might explain the only marginal gain in perf.

We should be looking for a more general optimization where we might
leverage the fact that the index can be updated only each 5 seconds.
I was thinking that we can use the initial NodeState from the index content
node as a way to tell if it changed or not (using equals calls).
It would work in the following way: first call, no state in the
searchManager, take the provided NodeState (again random node state, the
index could be on any node of the repo), build an index reader based on
this, reuse it from how many threads you need. Cache this under
path/NodeSate/IndexReader.  On each subsequent call we can use the provided
NodeState to check if the cache is stale or not: path + NodeState.equals.

The biggest problem I see here is resource cleanup, as we'll not call
decRef on each search call, we need a way to get notified when the
application shuts down. Similar to Chetan's patch we can use a combo of
'Closeable' and '@Deactivate' but I'm not sure that will be enough outside
OSGi.

Take this with a grain of salt, I probably missed some aspects of the
problem along the way.

best,
alex

On Wed, Apr 9, 2014 at 10:43 AM, Chetan Mehrotra
<[email protected]>wrote:

> On Wed, Apr 9, 2014 at 12:25 PM, Marcel Reutegger <[email protected]>
> wrote:
> >> Since the Lucene index is in any case updated asynchronously, it
> >> should be fine for us to ignore the base NodeState of the current
> >> session and instead use an IndexSearcher based on the last state as
> >> updated by the async indexer. This would allow us to reuse the
> >> IndexSearcher over multiple queries.
> >
> > I was also wondering if it makes sense to share it across multiple
> > sessions performing a query to reduce the number of index readers
> > that may be open at the same time. however, this will likely also
> > reduce concurrency because we synchronize access to a single
> > session.
>
> I tried with one approach where I used a custom SerahcerManager based
> on Lucene SearcherManager. It obtains the root NodeState directly from
> NodeStore. As NodeStore can be accessed concurrently it should not
> have any impact on session concurrency
>
> With this change there is a slight improvement
>
> Oak-Tar                            1      39      40      40      44
>    64    1459
> Oak-Tar(Shared)                    1      32      33      34      36
>    61    1738
>
> So did not gave much boost (at least with approach taken). As I do not
> have much understanding of Lucene internal can someone review the
> approach taken and see if there are some major issues with it
>
>
> Chetan Mehrotra
> [1]
> https://issues.apache.org/jira/secure/attachment/12639366/OAK-1702-shared-indexer.patch
> [2]
> https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/SearcherManager.html
>

Re: Slow full text query performance and Lucene Index handling in Oak

Reply via email to