[ 
https://issues.apache.org/jira/browse/OAK-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965442#comment-13965442
 ] 

Alex Parvulescu commented on OAK-1702:
--------------------------------------

bq. For example, given Lucene's field cache, the compression overhead we're 
seeing now because of repeated intializations of the IndexSearcher might simply 
vanish once the key problem is addressed. 

I think we already had numbers showing that only sharing the readers doesn't 
help that much. This is also ignoring last numbers published by Thomas which 
already had the new codec in. We can easily run the bench again without the 
codec and verify this assumption.
Digging even deeper I think the decompression is actually caused by the fact 
that we are now storing everything in the index. Looking at that code from a 
distance it looks like the decompression happens only on stored fields so why 
not take it out again and maybe lucene doesn't need to decompress anything.

There is another aspect of this decompression issue. It looks like lucene will 
decompress all the existing fields before accessing the one you need. We're 
only looking for the path, why pay the price each time? I think it should be 
smart enough to know how to skip certain bits of information more efficiently, 
especially in the case where I'm putting the single field I'm interested in at 
the very beginning of the document and I'm only asking for this one field. Part 
of this could be optimized with a custom field visitor, and I wanted to address 
this, but the decompression still remains an open issue.




> Create a benchmark for Full text search
> ---------------------------------------
>
>                 Key: OAK-1702
>                 URL: https://issues.apache.org/jira/browse/OAK-1702
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: bench
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.1
>
>         Attachments: OAK-1702-hack.patch, OAK-1702-lazy-cursor.patch, 
> OAK-1702-shared-indexer.patch, OAK-1702.oakcodec.patch, OAK-1702.patch
>
>
> To compare the performance of Full text search between Jackrabbit 2 and Oak a 
> benchmark should be added.
> To start with the benchmark would do following
> * Would be based on WikipediaImport benchmark. So it would import the 
> wikipedia dump and perform full text query on that
> * Should be able to run on both JR2 and Oak. Need to account for maven setup 
> to handle different Lucene version as JR2 uses 3.6.0 and Oak use 4.x
> Later we can add concurrent version



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to