Making document numbers persistent

2007-01-13 Thread Kay Roepke
Hi! With a project we want to use Lucene in, we are running into performance problems with regard to building filter sets. Let me give you a quick overview of what we need to do: We are indexing information about users (index magnitude is ranging between 2 - 10 million documents). Each of th

Re: Making document numbers persistent

2007-01-13 Thread Kay Roepke
Hi Erick! On 13. Jan 2007, at 19:54 , Erick Erickson wrote: Before going off into modifying things, could you expand a bit on how you query to build up the filter? Perhaps providing a code snippet? We are passing in our unique ids from our database which we have to translate to lucene doc

Re: Making document numbers persistent

2007-01-13 Thread Kay Roepke
Hi Karl! On 13. Jan 2007, at 20:12 , karl wettin wrote: 13 jan 2007 kl. 19.14 skrev Kay Roepke: All of the users (documents we index) are "connected" to certain other users, in a network fashion. We must be able to restrict the query (or filter it after searching the complete

Re: Making document numbers persistent

2007-01-13 Thread Kay Roepke
On 14. Jan 2007, at 2:40 , Mark Miller wrote: First, have you looked at SwarmCache? Cluster aware caching for java... No, I haven't come across that one. I'll take a look, thanks! As a matter of fact, we do have a network-wide caching mechanism, so that's what we use. Second...does it ma

Re: Making document numbers persistent

2007-01-13 Thread Kay Roepke
On 14. Jan 2007, at 3:20 , Mark Miller wrote: Sorry Kay, I jumped in midstream...I should have read your first post more thoroughly. No problem, it was a bit lenghty, anyway...sorry about that. I just tried to give enough information so that people don't get confused too much. By the w

Re: Making document numbers persistent

2007-01-14 Thread Kay Roepke
On 14. Jan 2007, at 7:10 , Chris Hostetter wrote: if you're talking about multiple identical servers used for load balancing, then there is no reason why those indexes wouldn't be kept in sync (the merge model is deterministic, so if you apply the same operations to every server in the same

Re: Making document numbers persistent

2007-01-14 Thread Kay Roepke
On 14. Jan 2007, at 10:58 , karl wettin wrote: In the original post you mention 2-10 million documents. How much is that is bytes? On my development machine I have 1.5 million documents and those are weighing in at ~950MB. I suspect that for production we will add more fields, so it woul

Re: Making document numbers persistent

2007-01-14 Thread Kay Roepke
On 14. Jan 2007, at 8:51 , Doron Cohen wrote: I think that one effective way to control docids changes, assuming delete/update rate significantly lower than add rate, is to modify Lucene such that deleted docs are only 'squeezed out' when calling optimize (). This would involve delicate cha

Re: Making document numbers persistent

2007-01-14 Thread Kay Roepke
On 14. Jan 2007, at 3:54 , Erick Erickson wrote: 3> I doubt it really will make a performance difference, but you could use TermDocs.seek rather than get a new termdocs for each term from the reader. (and if this *does* make a difference, please let me know) It seems it does. I have just

Re: Making document numbers persistent

2007-01-14 Thread Kay Roepke
On 14. Jan 2007, at 17:46 , Erick Erickson wrote: I just love it when I get so wrapped up in a particular approach that alternatives don't occur to me. So I wondered what would happen if I just got stupid simple and tried solving what I think is your problem without involving lucene. So,

Re: Re : lucene document id's

2007-01-27 Thread Kay Roepke
Hi! I promised karl that I'd share something on this topic, so here it goes. It fits the subject, too ;) On Jan 27, 2007, at 6:14 PM, Erick Erickson wrote: I believe you are correct about when document IDs change. That said, I'd strongly recommend you spend some time trying think of a way

Re: java gc with a frequently changing index?

2007-07-30 Thread Kay Roepke
Hi Tim! On Jul 25, 2007, at 8:41 PM, Tim Sturge wrote: I am indexing a set of constantly changing documents. The change rate is moderate (about 10 docs/sec over a 10M document collection with a 6G total size) but I want to be right up to date (ideally within a second but within 5 seconds