Re: Efficient filtering advise

2009-11-24 Thread Eran Sevi
Erick, Thanks for all your help so far. I'll try and see if upgrading to 2.9.1 won't cause us too much changes and if it's stable enough. If upgrading won't work then I can revert to using TermsFilter and BooleanFilter from contrib which should cover all my needs and maybe it will even be faster t

Re: Efficient filtering advise

2009-11-23 Thread Erick Erickson
This was a really silly idea I had . If your time is being spent in the scoring in the first place, keeping the Filter out of the query and checking against it later in your Collector won't change the timing because you'll have done all the scoring anyway. But I only thought about it on the way hom

Re: Efficient filtering advise

2009-11-23 Thread Erick Erickson
See: http://issues.apache.org/jira/browse/LUCENE-1427 Short form: this is fixed, but not until 2.9. If you don't want to upgrade, you could always leave the Filter off your initial query and have your Collector insure that any docs were in the Fil

Re: Efficient filtering advise

2009-11-23 Thread Eran Sevi
I've taken TermsFilter from contrib which does exactly that and indeed the speed was reduced to half, which starts to be reasonable for my needs. I've researched the regular QueryFilter and what I write here might not be the complete picture: I found out that most of the time is spent on scoring t

Re: Efficient filtering advise

2009-11-23 Thread Erick Erickson
Oh my goodness yes. No wonder nothing I suggested made any difference . Ignore everything I've written OK, here's something to try, and it goes back to a Filter. Rather than make this enormous bunch of ORs, try creating a Filter. Use TermDocs to run through your list of IDs assembling a Filter

Re: Efficient filtering advise

2009-11-23 Thread Eran Sevi
Erick, Maybe I didn't make myself clear enough. I'm talking about high level filters used when searching. I construct a very big BooleanQuery and add 50K clauses to it (I removed the limit on max clauses). Each clause is a TermQuery on the same field. I don't know the internal doc ids that I want

Re: Efficient filtering advise

2009-11-23 Thread Erick Erickson
Now I'm really confused, which usually means I'm making some assumptions that aren't true. So here they are... 1> You're talking about Filters that contain BitSets, right? Not some other kind of filter. 2> When you create your 10-50K filters, you wind up with a single filter by combining

Re: Efficient filtering advise

2009-11-23 Thread Eran Sevi
After commenting out the collector logic, the time is still more or less the same. Anyway, since without the filter collecting the documents is very fast it's probably something with the filter itself. I don't know how the filter (or boolean query) work internally but probably for 10K or 50K claus

Re: Efficient filtering advise

2009-11-22 Thread Erick Erickson
Hmmm, could you show us what you do in your collector? Because one of the gotchas about a collector is loading the documents in the inner loop. Quick test: comment out whatever you're doing in the underlying collector loop, and see if there's *any* noticeable difference in speed. That'll tell you w

Re: Efficient filtering advise

2009-11-22 Thread Paul Elschot
ter.html> > > > > Uwe > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > > > -Original Message- > > > From: Eran Sevi [mailt

Re: Efficient filtering advise

2009-11-22 Thread Eran Sevi
I think it shouldn't take X5 times longer since the number of results is only about X2 times larger (and much smaller than the number of terms in the filter), but maybe I'm wrong here since I'm not familiar with the filter internals. Unfortunately, the time to construct the filter is mere millisec

Re: Efficient filtering advise

2009-11-22 Thread Erick Erickson
Hmmm, I'm not very clear here. Are you saying that you effectively form 10-50K filters and OR them all together? That would be consistent with the 50K case taking approx. 5X a long as the 10K case. Do you know where in your code the time is being spent? That'd be a big help in suggesting alter

Re: Efficient filtering advise

2009-11-22 Thread Eran Sevi
> > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Eran Sevi [mailto:erans...@gmail.com] > > Sent: Sunday, November 22, 2009 3:49

RE: Efficient filtering advise

2009-11-22 Thread Uwe Schindler
> -Original Message- > From: Eran Sevi [mailto:erans...@gmail.com] > Sent: Sunday, November 22, 2009 3:49 PM > To: java-user@lucene.apache.org > Subject: Efficient filtering advise > > Hi, > > I have a need to filter my queries using a rather large subset of terms

Re: Efficient filtering advise

2009-11-22 Thread Paul Elschot
Try a MultiTermQueryWrapperFilter instead of the QueryFilter. I'd expect a modest gain in performance. In case it is possible to form a few groups of terms that are reused, it could even be more efficient to also use a CachingWrapperFilter for each of these groups. Regards, Paul Elschot Op zonda

Efficient filtering advise

2009-11-22 Thread Eran Sevi
Hi, I have a need to filter my queries using a rather large subset of terms (can be 10K or even 50K). All these terms are sure to exist in the index so the number of results can be about the same number of terms in the filter. The terms are numbers but are not subsequent and are from a large set o