search performance

2014-06-02 Thread Jamie
Greetings Despite following all the recommended optimizations (as described at http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) , in some of our installations, search performance has reached the point where is it unacceptably slow. For instance, in one environment, the total index

Re: remapping docIds in a read only offline built index

2014-06-02 Thread Olivier Binda
Hello, I'm still interested in having the answer to the following question : In a 1-segment read-only index (that is built offline once and then frozen), is it possible to remap the docIds ? I may have a (working but not optimal) answer to my original problem : I may use a MultiReader and

Re: remapping docIds in a read only offline built index

2014-06-02 Thread Michael McCandless
The index sorting APIs (in lucene/misc) can do this. E.g. you could make a SortingAtomicReader, with your sort criteria, then use addIndexes(IR[]) to add it to a new index. That resulting index would have 1 segment and the docIDs would be in your order. Mike McCandless

Re: remapping docIds in a read only offline built index

2014-06-02 Thread Olivier Binda
Very nice ! That is exactly what I needed. Thank you very much ! On 06/02/2014 09:26 AM, Michael McCandless wrote: The index sorting APIs (in lucene/misc) can do this. E.g. you could make a SortingAtomicReader, with your sort criteria, then use addIndexes(IR[]) to add it to a new index. That

Re: search performance

2014-06-02 Thread Tincu Gabriel
What kind of queries are you pushing into the index. Do they match a lot of documents ? Do you do any sorting on the result set? What is the average document size ? Do you have a lot of update traffic ? What kind of schema does your index use ? On Mon, Jun 2, 2014 at 6:51 AM, Jamie

Re: search performance

2014-06-02 Thread Jamie
Tom Thanks for the offer of assistance. On 2014/06/02, 12:02 PM, Tincu Gabriel wrote: What kind of queries are you pushing into the index. We are indexing regular emails + attachments. Typical query is something like: filter: to:mbox08 from:mbox08 cc:mbox08 bcc:mbox08

Re: search performance

2014-06-02 Thread Jack Krupansky
Do you have enough system memory to fit the entire index in OS system memory so that the OS can fully cache it instead of thrashing with I/O? Do you see a lot of I/O or are the queries compute-bound? You said you have a 128GB machine, so that sounds small for your index. Have you tried a

Re: search performance

2014-06-02 Thread Jamie
Jack First off, thanks for applying your mind to our performance problem. On 2014/06/02, 1:34 PM, Jack Krupansky wrote: Do you have enough system memory to fit the entire index in OS system memory so that the OS can fully cache it instead of thrashing with I/O? Do you see a lot of I/O or are

Re: MultiReader docid reliability

2014-06-02 Thread Nicola Buso
Hi Erick, the good reason for now is caching, we use them to store the results in cache, and I wanted a better explanation of ephemeral do understand the possible life of the cache. From the answers, ephemeral can be related to the opening of the indexreader (in general for precaution) and all

Re: search performance

2014-06-02 Thread Tincu Gabriel
MMapDirectory will do the job for you. RamDirectory has a big warning in the class description stating that the performance will get killed by an index larger than a few hundred MB, and NRTCachingDirectory is a wrapper for RamDirectory and suitable for low update rates. MMap will use the system

Re: search performance

2014-06-02 Thread Jamie
I was under the impression that NRTCachingDirectory will instantiate an MMapDirectory if a 64 bit platform is detected? Is this not the case? On 2014/06/02, 2:09 PM, Tincu Gabriel wrote: MMapDirectory will do the job for you. RamDirectory has a big warning in the class description stating that

Re: search performance

2014-06-02 Thread Tincu Gabriel
My bad, It's using the RamDirectory as a cache and a delegate directory that you pass in the constructor to do the disk operations, limiting the use of the RamDirectory to files that fit a certain size. So i guess the underlying Directory implementation will be whatever you choose it to be. I'd

Re: search performance

2014-06-02 Thread Jamie
I assume you meant 1000 documents. Yes, the page size is in fact configurable. However, it only obtains the page size * 3. It preloads the following and previous page too. The point is, it only obtains the documents that are needed. On 2014/06/02, 3:03 PM, Tincu Gabriel wrote: My bad, It's

Re: search performance

2014-06-02 Thread Tri Cao
This is an interesting performance problem and I think there is probably not a single answer here, so I'll just layout the steps I would take to tackle this: 1. What is the variance of the query latency? You said the average is 5 minutes, but is it due to some really bad queries or most queries

Possible order violation in lucene library version 2.4.1

2014-06-02 Thread Swarnendu Biswas
Hi, I am working on a research project on data race detection, and am using the DaCapo benchmarks for evaluation. I am using the benchmark lusearch from the 2009 suite, which uses lucene library 2.4.1. For one test case, I am monitoring a pair of accesses say,