Greetings
Despite following all the recommended optimizations (as described at
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) , in some of
our installations, search performance has reached the point where is it
unacceptably slow. For instance, in one environment, the total index
Hello, I'm still interested in having the answer to the following question :
In a 1-segment read-only index (that is built offline once and then
frozen), is it possible to remap the docIds ?
I may have a (working but not optimal) answer to my original problem : I
may use a MultiReader and
The index sorting APIs (in lucene/misc) can do this. E.g. you could
make a SortingAtomicReader, with your sort criteria, then use
addIndexes(IR[]) to add it to a new index. That resulting index would
have 1 segment and the docIDs would be in your order.
Mike McCandless
Very nice ! That is exactly what I needed. Thank you very much !
On 06/02/2014 09:26 AM, Michael McCandless wrote:
The index sorting APIs (in lucene/misc) can do this. E.g. you could
make a SortingAtomicReader, with your sort criteria, then use
addIndexes(IR[]) to add it to a new index. That
What kind of queries are you pushing into the index. Do they match a lot of
documents ? Do you do any sorting on the result set? What is the average
document size ? Do you have a lot of update traffic ? What kind of schema
does your index use ?
On Mon, Jun 2, 2014 at 6:51 AM, Jamie
Tom
Thanks for the offer of assistance.
On 2014/06/02, 12:02 PM, Tincu Gabriel wrote:
What kind of queries are you pushing into the index.
We are indexing regular emails + attachments.
Typical query is something like:
filter: to:mbox08 from:mbox08 cc:mbox08 bcc:mbox08
Do you have enough system memory to fit the entire index in OS system memory
so that the OS can fully cache it instead of thrashing with I/O? Do you see
a lot of I/O or are the queries compute-bound?
You said you have a 128GB machine, so that sounds small for your index. Have
you tried a
Jack
First off, thanks for applying your mind to our performance problem.
On 2014/06/02, 1:34 PM, Jack Krupansky wrote:
Do you have enough system memory to fit the entire index in OS system
memory so that the OS can fully cache it instead of thrashing with
I/O? Do you see a lot of I/O or are
Hi Erick,
the good reason for now is caching, we use them to store the results in
cache, and I wanted a better explanation of ephemeral do understand
the possible life of the cache.
From the answers, ephemeral can be related to the opening of the
indexreader (in general for precaution) and all
MMapDirectory will do the job for you. RamDirectory has a big warning in
the class description stating that the performance will get killed by an
index larger than a few hundred MB, and NRTCachingDirectory is a wrapper
for RamDirectory and suitable for low update rates. MMap will use the
system
I was under the impression that NRTCachingDirectory will instantiate an
MMapDirectory if a 64 bit platform is detected? Is this not the case?
On 2014/06/02, 2:09 PM, Tincu Gabriel wrote:
MMapDirectory will do the job for you. RamDirectory has a big warning in
the class description stating that
My bad, It's using the RamDirectory as a cache and a delegate directory
that you pass in the constructor to do the disk operations, limiting the
use of the RamDirectory to files that fit a certain size. So i guess the
underlying Directory implementation will be whatever you choose it to be.
I'd
I assume you meant 1000 documents. Yes, the page size is in fact
configurable. However, it only obtains the page size * 3. It preloads
the following and previous page too. The point is, it only obtains the
documents that are needed.
On 2014/06/02, 3:03 PM, Tincu Gabriel wrote:
My bad, It's
This is an interesting performance problem and I think there is probably not
a single answer here, so I'll just layout the steps I would take to tackle this:
1. What is the variance of the query latency? You said the average is 5 minutes,
but is it due to some really bad queries or most queries
Hi,
I am working on a research project on data race detection, and am using the
DaCapo benchmarks for evaluation. I am using the benchmark lusearch from the
2009 suite, which uses lucene library 2.4.1.
For one test case, I am monitoring a pair of accesses say,
15 matches
Mail list logo