Hi!
With a project we want to use Lucene in, we are running into
performance problems with regard to building filter sets.
Let me give you a quick overview of what we need to do:
We are indexing information about users (index magnitude is ranging
between 2 - 10 million documents). Each of th
Hi Erick!
On 13. Jan 2007, at 19:54 , Erick Erickson wrote:
Before going off into modifying things, could you expand a bit on
how you
query to build up the filter? Perhaps providing a code snippet?
We are passing in our unique ids from our database which we have to
translate
to lucene doc
Hi Karl!
On 13. Jan 2007, at 20:12 , karl wettin wrote:
13 jan 2007 kl. 19.14 skrev Kay Roepke:
All of the users (documents we index) are "connected" to certain
other users,
in a network fashion. We must be able to restrict the query (or
filter it after
searching the complete
On 14. Jan 2007, at 2:40 , Mark Miller wrote:
First, have you looked at SwarmCache? Cluster aware caching for
java...
No, I haven't come across that one. I'll take a look, thanks!
As a matter of fact, we do have a network-wide caching mechanism, so
that's what we use.
Second...does it ma
On 14. Jan 2007, at 3:20 , Mark Miller wrote:
Sorry Kay, I jumped in midstream...I should have read your first
post more thoroughly.
No problem, it was a bit lenghty, anyway...sorry about that. I just
tried to give enough information so that people don't get confused
too much.
By the w
On 14. Jan 2007, at 7:10 , Chris Hostetter wrote:
if you're talking about multiple identical servers used for load
balancing, then there is no reason why those indexes wouldn't be
kept in
sync (the merge model is deterministic, so if you apply the same
operations to every server in the same
On 14. Jan 2007, at 10:58 , karl wettin wrote:
In the original post you mention 2-10 million documents. How much
is that is bytes?
On my development machine I have 1.5 million documents and those are
weighing in at
~950MB. I suspect that for production we will add more fields, so it
woul
On 14. Jan 2007, at 8:51 , Doron Cohen wrote:
I think that one effective way to control docids changes, assuming
delete/update rate significantly lower than add rate, is to modify
Lucene
such that deleted docs are only 'squeezed out' when calling optimize
().
This would involve delicate cha
On 14. Jan 2007, at 3:54 , Erick Erickson wrote:
3> I doubt it really will make a performance difference, but you
could use
TermDocs.seek rather than get a new termdocs for each term from the
reader.
(and if this *does* make a difference, please let me know)
It seems it does. I have just
On 14. Jan 2007, at 17:46 , Erick Erickson wrote:
I just love it when I get so wrapped up in a particular approach that
alternatives don't occur to me. So I wondered what would happen if
I just
got stupid simple and tried solving what I think is your problem
without
involving lucene.
So,
Hi!
I promised karl that I'd share something on this topic, so here it
goes. It fits the subject, too ;)
On Jan 27, 2007, at 6:14 PM, Erick Erickson wrote:
I believe you are correct about when document IDs change. That
said, I'd
strongly recommend you spend some time trying think of a way
Hi Tim!
On Jul 25, 2007, at 8:41 PM, Tim Sturge wrote:
I am indexing a set of constantly changing documents. The change
rate is moderate (about 10 docs/sec over a 10M document collection
with a 6G total size) but I want to be right up to date (ideally
within a second but within 5 seconds
12 matches
Mail list logo