Hello, I'm still interested in having the answer to the following question :
In a 1-segment read-only index (that is built offline once and then
frozen), is it possible to remap the docIds ?
I may have a (working but not optimal) answer to my original problem : I
may use a MultiReader and 3
The index sorting APIs (in lucene/misc) can do this. E.g. you could
make a SortingAtomicReader, with your sort criteria, then use
addIndexes(IR[]) to add it to a new index. That resulting index would
have 1 segment and the docIDs would be in your order.
Mike McCandless
http://blog.mikemccandles
Very nice ! That is exactly what I needed. Thank you very much !
On 06/02/2014 09:26 AM, Michael McCandless wrote:
The index sorting APIs (in lucene/misc) can do this. E.g. you could
make a SortingAtomicReader, with your sort criteria, then use
addIndexes(IR[]) to add it to a new index. That
What kind of queries are you pushing into the index. Do they match a lot of
documents ? Do you do any sorting on the result set? What is the average
document size ? Do you have a lot of update traffic ? What kind of schema
does your index use ?
On Mon, Jun 2, 2014 at 6:51 AM, Jamie wrote:
> Gre
Tom
Thanks for the offer of assistance.
On 2014/06/02, 12:02 PM, Tincu Gabriel wrote:
What kind of queries are you pushing into the index.
We are indexing regular emails + attachments.
Typical query is something like:
filter: to:mbox08 from:mbox08 cc:mbox08 bcc:mbox08
deliver
Do you have enough system memory to fit the entire index in OS system memory
so that the OS can fully cache it instead of thrashing with I/O? Do you see
a lot of I/O or are the queries compute-bound?
You said you have a 128GB machine, so that sounds small for your index. Have
you tried a 256GB
Jack
First off, thanks for applying your mind to our performance problem.
On 2014/06/02, 1:34 PM, Jack Krupansky wrote:
Do you have enough system memory to fit the entire index in OS system
memory so that the OS can fully cache it instead of thrashing with
I/O? Do you see a lot of I/O or are t
Hi Erick,
the good reason for now is caching, we use them to store the results in
cache, and I wanted a better explanation of "ephemeral" do understand
the possible life of the cache.
>From the answers, ephemeral can be related to the opening of the
indexreader (in general for precaution) and all
MMapDirectory will do the job for you. RamDirectory has a big warning in
the class description stating that the performance will get killed by an
index larger than a few hundred MB, and NRTCachingDirectory is a wrapper
for RamDirectory and suitable for low update rates. MMap will use the
system RAM
I was under the impression that NRTCachingDirectory will instantiate an
MMapDirectory if a 64 bit platform is detected? Is this not the case?
On 2014/06/02, 2:09 PM, Tincu Gabriel wrote:
MMapDirectory will do the job for you. RamDirectory has a big warning in
the class description stating that
My bad, It's using the RamDirectory as a cache and a delegate directory
that you pass in the constructor to do the disk operations, limiting the
use of the RamDirectory to files that fit a certain size. So i guess the
underlying Directory implementation will be whatever you choose it to be.
I'd sti
I assume you meant 1000 documents. Yes, the page size is in fact
configurable. However, it only obtains the page size * 3. It preloads
the following and previous page too. The point is, it only obtains the
documents that are needed.
On 2014/06/02, 3:03 PM, Tincu Gabriel wrote:
My bad, It's u
This is an interesting performance problem and I think there is probably not
a single answer here, so I'll just layout the steps I would take to tackle this:
1. What is the variance of the query latency? You said the average is 5 minutes,
but is it due to some really bad queries or most queries h
Hi,
I am working on a research project on data race detection, and am using the
DaCapo benchmarks for evaluation. I am using the benchmark lusearch from the
2009 suite, which uses lucene library 2.4.1.
For one test case, I am monitoring a pair of accesses say,
Lorg/apache/lucene/store/Dire
On Mon, 2014-06-02 at 08:51 +0200, Jamie wrote:
[200GB, 150M documents]
> With NRT enabled, search speed is roughly 5 minutes on average.
> The server resources are:
> 2x6 Core Intel CPU, 128GB, 2 SSD for index and RAID 0, with Linux.
5 minutes is extremely long. Is that really the right number
Toke
Thanks for the comment.
Unfortunately, in this instance, it is a live production system, so we
cannot conduct experiments. The number is definitely accurate.
We have many different systems with a similar load that observe the same
performance issue. To my knowledge, the Lucene integrati
Can you take thread stacktraces (repeatedly) during those 5 minute
searches? That might give you (or someone on the mailing list) a clue
where all that time is spent.
You could try using jstack for that:
http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html
Regards
Christoph
17 matches
Mail list logo