Re: Combining results of multiple indexes

2009-01-22 Thread Preetham Kajekar
Hi, Just thought of sharing some more progress I made on this. This time I created multiple (2) indexWriter writing different documents (based on if it is odd or even based on an id - not doc-id) to different indexes and the performance seems to scale up based on the number of threads (and the

RE: Combining results of multiple indexes

2008-12-24 Thread Chris Hostetter
: a) once a doc is added to an index, it will not get modified/deleted : b) all the fields added are keywords (mostly numbers) - no analysis is : required. : c) indexing speed is more important than querying speed. : d) every document is the same - there is no boost or relevancy required. : : e

Re: Combining results of multiple indexes

2008-12-23 Thread Erick Erickson
--Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Friday, December 19, 2008 12:12 AM > To: java-user@lucene.apache.org > Subject: Re: Combining results of multiple indexes > > I would recommend, very strongly, that you don

Re: Re: RE: Combining results of multiple indexes

2008-12-22 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: RE: Combining results of multiple indexes

2008-12-22 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Combining results of multiple indexes

2008-12-22 Thread Preetham Kajekar (preetham)
indexed. Thanks, ~preetham -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, December 19, 2008 12:12 AM To: java-user@lucene.apache.org Subject: Re: Combining results of multiple indexes I would recommend, very strongly, that you don't rely on th

Re: Combining results of multiple indexes

2008-12-18 Thread Erick Erickson
I would recommend, very strongly, that you don't rely on the doc IDs being the same in two different indexes. Doc IDs are just incremented by one for each doc added, but. optimization can change the doc ID. and is guaranteed to change at least some of them if there are deletions from your inde

Re: Combining results of multiple indexes

2008-12-18 Thread Michael McCandless
These results are surprising. I'd expect single IndexWriter with 2 threads to do better than a single thread, but in your test two threads are significantly worse than one. Is it possible there's a bottleneck outside of Lucene in sourcing the documents? How many segments are produced a

Re: Combining results of multiple indexes

2008-12-18 Thread Preetham Kajekar
Hi, I noticed that the doc id is the same. So, if I have HitCollector, just collect the doc-ids of both Searchers (for the two indexes) and find the intersection between them, it would work. Also, get the doc is even where there are large number of hits is fast. Of course, I am using somethin

Re: Combining results of multiple indexes

2008-12-18 Thread Preetham Kajekar
Thanks. Yep the code is very easy. However, it take about 3 mins to complete merging. Looks like I will need to have an out of band merging of indexes once they are closed (planning to store about 50mil entries in each index partition) However, as the data is being indexed, is there any oth

Re: Combining results of multiple indexes

2008-12-18 Thread Erick Erickson
You will be stunned at how easy it is. The merging code should be a dozen lines (and that only if you are merging 6 or so indexes) See IndexWriter.addIndexes or IndexWriter.addIndexesNoOptimize Best Erick On Thu, Dec 18, 2008 at 5:03 AM, Preetham Kajekar wrote: > Hi, > I tried out a single

Re: Combining results of multiple indexes

2008-12-18 Thread Preetham Kajekar
Hi, I tried out a single IndexWriter used by two threads to index different fields. It is slower than using two separate IndexWriters. These are my findings All Fields (9) using 1 IndexWriter 1 Thread - 38,000 object per sec 5 Fields using 1 IndexWriter 1 Thread - 62,000 object per sec A

Re: Combining results of multiple indexes

2008-12-17 Thread Preetham Kajekar
Thanks Erick and Michael. I will try out these suggestions and post my findings. ~preetham Erick Erickson wrote: Well, maybe if I'd read the original post more carefully I'd have figured that out, sorry 'bout that. I *think* I remember reading somewhere on the email lists that your indexing sp

Re: Combining results of multiple indexes

2008-12-17 Thread Erick Erickson
Well, maybe if I'd read the original post more carefully I'd have figured that out, sorry 'bout that. I *think* I remember reading somewhere on the email lists that your indexing speed goes up pretty linearly as the number of indexing tasks approaches the number of CPUs. Are you, perhaps, on a dua

Re: Combining results of multiple indexes

2008-12-17 Thread Michael McCandless
Have you tested your indexing throughput with two threads sharing one IndexWriter (one index)? Mike Preetham Kajekar wrote: Hi Erick, Thanks for the response. Replies inline. Erick Erickson wrote: The very first question is always "are you opening a new searcher each time you query"? But

Re: Combining results of multiple indexes

2008-12-17 Thread Preetham Kajekar
Hi Erick, Thanks for the response. Replies inline. Erick Erickson wrote: The very first question is always "are you opening a new searcher each time you query"? But you've looked at the Wiki so I assume not. This question is closely tied to what kind of latency you can tolerate. A few more deta

Re: Combining results of multiple indexes

2008-12-17 Thread Erick Erickson
The very first question is always "are you opening a new searcher each time you query"? But you've looked at the Wiki so I assume not. This question is closely tied to what kind of latency you can tolerate. A few more details, please. What's slow? Queries? Indexing? How slow? 100ms? 100s? What ar

Re: Combining results of multiple indexes

2008-12-17 Thread Preetham Kajekar
Hi Grant, Thanks four response. Replies inline. Grant Ingersoll wrote: On Dec 17, 2008, at 12:57 AM, Preetham Kajekar wrote: Hi, I am new to Lucene. I am not using it as a pure text indexer. I am trying to index a Java object which has about 10 fields (like id, time, srcIp, dstIp) - most of

Re: Combining results of multiple indexes

2008-12-17 Thread Grant Ingersoll
On Dec 17, 2008, at 12:57 AM, Preetham Kajekar wrote: Hi, I am new to Lucene. I am not using it as a pure text indexer. I am trying to index a Java object which has about 10 fields (like id, time, srcIp, dstIp) - most of them being numerical values. In order to speed up indexing, I figured t