Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-10 Thread Thomas Mueller
Hi, I not sure if Chetans test case matches the real world usage, if Collections.sort takes up 23% of the performance... I have not seen Collections.sort in other profiling results at all (so I guess it was less than 1%). Also, I have seen opening the Lucene index takes much more time in other tes

Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Chetan Mehrotra
Current update 1. Tommaso provided a patch (OAK-1702) to disable compression and that also helps quite a bit 2. Currently we are storing the full tokenized text in Lucene Index [1]. This would cause fetching of doc fields to be slower. On disabling the storage the number improve quite a bit. This

Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Alex Parvulescu
Aside from the compression issue, there was another one related to the 'order by' clause. I saw Collections.sort taking up as far as 23% of the perf. I removed the order by temporarily so it doesn't get in the way of the Lucene stuff, but I think the QueryEngine should skip ordering results in thi

Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Tommaso Teofili
I'm looking into the Lucene codecs right now. Tommaso 2014-04-09 15:20 GMT+02:00 Alex Parvulescu : > Profiling the result shows that quite a bit of time goes in > org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I > think is part of Lucene 4.x and not present in 3.x. Any idea i

Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Alex Parvulescu
Profiling the result shows that quite a bit of time goes in org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I think is part of Lucene 4.x and not present in 3.x. Any idea if I can disable compression? +1 I noticed that too, we should try to disable compression and compare results

Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Chetan Mehrotra
On Wed, Apr 9, 2014 at 5:14 PM, Jukka Zitting wrote: > Is that a common use case? To better simulate a normal usage scenario > I'd make the benchmark fetch up to N results (where N is configurable, > with default something like 20) and access the path and the title > property of the matching nodes

Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Thomas Mueller
> >also, I wonder if we shouldn't also profile the stack of underlying calls >in the QueryEngine to measure how much time is spent there and how much >time is spent in the specific QueryIndex implementation. Analyzing full thread dumps will give you the statistical distribution, which is quite acc

Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Tommaso Teofili
2014-04-09 13:44 GMT+02:00 Jukka Zitting : > Hi, > > On Wed, Apr 9, 2014 at 7:24 AM, Chetan Mehrotra > wrote: > > ... the testcase only fetches the first result. > > Is that a common use case? To better simulate a normal usage scenario > I'd make the benchmark fetch up to N results (where N is co

Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Thomas Mueller
Hi, We have results from a different test case with multiple threads (internal id GRANITE-5572). We have 50 full thread dumps, and there I count: * 259 cases of LuceneIndex.java line 365: IndexReader reader = DirectoryReader.open(directory); * 43 cases of LuceneIndex.java line 379: TopDocs d

Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Jukka Zitting
Hi, On Wed, Apr 9, 2014 at 7:24 AM, Chetan Mehrotra wrote: > ... the testcase only fetches the first result. Is that a common use case? To better simulate a normal usage scenario I'd make the benchmark fetch up to N results (where N is configurable, with default something like 20) and access the

Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Chetan Mehrotra
On Wed, Apr 9, 2014 at 3:00 PM, Alex Parvulescu wrote: > - the patch assumes that there is and will be a single lucene index > directly under the root node, which may not necessarily be the case. I > agree this assumption holds now, but I would not introduce any changes that > take away this flex

Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Alex Parvulescu
Hi, I agree with the idea to find a way to share the readers across threads. Looking at the proposed patch I see a few problems: - the patch assumes that there is and will be a single lucene index directly under the root node, which may not necessarily be the case. I agree this assumption holds

Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Chetan Mehrotra
On Wed, Apr 9, 2014 at 12:25 PM, Marcel Reutegger wrote: >> Since the Lucene index is in any case updated asynchronously, it >> should be fine for us to ignore the base NodeState of the current >> session and instead use an IndexSearcher based on the last state as >> updated by the async indexer.

Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-09 Thread Thomas Mueller
Hi, Do we still have the option to store the Lucene files in the file system? If we have, maybe we could run the test with that option and see if it improves performance? I'm not suggesting this is a solution, it's just one step to better analyze things. And it might be easy to do. Regards, Thoma

RE: Slow full text query performance and Lucene Index handling in Oak

2014-04-08 Thread Marcel Reutegger
Hi, > Since the Lucene index is in any case updated asynchronously, it > should be fine for us to ignore the base NodeState of the current > session and instead use an IndexSearcher based on the last state as > updated by the async indexer. This would allow us to reuse the > IndexSearcher over mul

Re: Slow full text query performance and Lucene Index handling in Oak

2014-04-08 Thread Jukka Zitting
Hi, On Tue, Apr 8, 2014 at 11:51 AM, Chetan Mehrotra wrote: > 1. Multiple IndexSearcher instances - Current impl would create a new > IndexSearcher for every Lucene query as the OakDirectory uses is bound > to NodeState of executing JCR session. Since the Lucene index is in any case updated asyn

Slow full text query performance and Lucene Index handling in Oak

2014-04-08 Thread Chetan Mehrotra
Hi, As part of OAK-1702 I have added a benchmark to compare the performance of Full text query search with JR2 Based on approach taken (which might be wrong) I get following numbers Apache Jackrabbit Oak 0.21.0-SNAPSHOT # FullTextSearchTest C min 10% 50% 90% max