Hi,
I not sure if Chetans test case matches the real world usage, if
Collections.sort takes up 23% of the performance... I have not seen
Collections.sort in other profiling results at all (so I guess it was less
than 1%). Also, I have seen opening the Lucene index takes much more time
in other tes
Current update
1. Tommaso provided a patch (OAK-1702) to disable compression and that
also helps quite a bit
2. Currently we are storing the full tokenized text in Lucene Index
[1]. This would cause fetching of doc fields to be slower. On
disabling the storage the number improve quite a bit. This
Aside from the compression issue, there was another one related to the
'order by' clause. I saw Collections.sort taking up as far as 23% of the
perf.
I removed the order by temporarily so it doesn't get in the way of the
Lucene stuff, but I think the QueryEngine should skip ordering results in
thi
I'm looking into the Lucene codecs right now.
Tommaso
2014-04-09 15:20 GMT+02:00 Alex Parvulescu :
> Profiling the result shows that quite a bit of time goes in
> org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I
> think is part of Lucene 4.x and not present in 3.x. Any idea i
Profiling the result shows that quite a bit of time goes in
org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I
think is part of Lucene 4.x and not present in 3.x. Any idea if I can
disable compression?
+1 I noticed that too, we should try to disable compression and compare
results
On Wed, Apr 9, 2014 at 5:14 PM, Jukka Zitting wrote:
> Is that a common use case? To better simulate a normal usage scenario
> I'd make the benchmark fetch up to N results (where N is configurable,
> with default something like 20) and access the path and the title
> property of the matching nodes
>
>also, I wonder if we shouldn't also profile the stack of underlying calls
>in the QueryEngine to measure how much time is spent there and how much
>time is spent in the specific QueryIndex implementation.
Analyzing full thread dumps will give you the statistical distribution,
which is quite acc
2014-04-09 13:44 GMT+02:00 Jukka Zitting :
> Hi,
>
> On Wed, Apr 9, 2014 at 7:24 AM, Chetan Mehrotra
> wrote:
> > ... the testcase only fetches the first result.
>
> Is that a common use case? To better simulate a normal usage scenario
> I'd make the benchmark fetch up to N results (where N is co
Hi,
We have results from a different test case with multiple threads (internal
id GRANITE-5572). We have 50 full thread dumps, and there I count:
* 259 cases of LuceneIndex.java line 365:
IndexReader reader = DirectoryReader.open(directory);
* 43 cases of LuceneIndex.java line 379:
TopDocs d
Hi,
On Wed, Apr 9, 2014 at 7:24 AM, Chetan Mehrotra
wrote:
> ... the testcase only fetches the first result.
Is that a common use case? To better simulate a normal usage scenario
I'd make the benchmark fetch up to N results (where N is configurable,
with default something like 20) and access the
On Wed, Apr 9, 2014 at 3:00 PM, Alex Parvulescu
wrote:
> - the patch assumes that there is and will be a single lucene index
> directly under the root node, which may not necessarily be the case. I
> agree this assumption holds now, but I would not introduce any changes that
> take away this flex
Hi,
I agree with the idea to find a way to share the readers across threads.
Looking at the proposed patch I see a few problems:
- the patch assumes that there is and will be a single lucene index
directly under the root node, which may not necessarily be the case. I
agree this assumption holds
On Wed, Apr 9, 2014 at 12:25 PM, Marcel Reutegger wrote:
>> Since the Lucene index is in any case updated asynchronously, it
>> should be fine for us to ignore the base NodeState of the current
>> session and instead use an IndexSearcher based on the last state as
>> updated by the async indexer.
Hi,
Do we still have the option to store the Lucene files in the file system?
If we have, maybe we could run the test with that option and see if it
improves performance? I'm not suggesting this is a solution, it's just one
step to better analyze things. And it might be easy to do.
Regards,
Thoma
Hi,
> Since the Lucene index is in any case updated asynchronously, it
> should be fine for us to ignore the base NodeState of the current
> session and instead use an IndexSearcher based on the last state as
> updated by the async indexer. This would allow us to reuse the
> IndexSearcher over mul
Hi,
On Tue, Apr 8, 2014 at 11:51 AM, Chetan Mehrotra
wrote:
> 1. Multiple IndexSearcher instances - Current impl would create a new
> IndexSearcher for every Lucene query as the OakDirectory uses is bound
> to NodeState of executing JCR session.
Since the Lucene index is in any case updated asyn
Hi,
As part of OAK-1702 I have added a benchmark to compare the
performance of Full text query search with JR2
Based on approach taken (which might be wrong) I get following numbers
Apache Jackrabbit Oak 0.21.0-SNAPSHOT
# FullTextSearchTest C min 10% 50% 90%
max
17 matches
Mail list logo