Re: speed of BooleanQueries on 2.9

2009-07-16 Thread Erick Erickson
OK, I'm feeling old today. But do any of you kids out there have any idea how miraculous this thread is? In "the bad old days", or "when I was your age", getting to the bottom of a problem like this would have involved on-sited consultants at $150/hour and about 6 months. Assuming that the product

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread eks dev
> > How do you handle stop words in phrase queries? ok, good point! You found another item for list of BADs... but not for me as we do not use phrase Qs to be honest, I do not even know how they are implemented... but no, there are no positions in such cache... well, they remain slowe

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread Jason Rutherglen
be honest, I do not know is anyone today runs high volume search from disk > (maybe SSD), even than, significant portion has to be in RAM... > > One day we could throw many CPUs at Query... but this is not an easy one... > > > > > > - Original Message >> F

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread eks dev
day, 16 July, 2009 19:22:28 > Subject: Re: speed of BooleanQueries on 2.9 > > Do we think that we'll be able to support indexing stop words > using PFOR (with relaxation on the compression to gain > performance?) Today it seems like the best approach to indexing > stop word

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread Jason Rutherglen
ptimist without numbers to prove it). > > Cheers, Eks > > > > > > > > > - Original Message >> From: Michael McCandless >> To: java-user@lucene.apache.org >> Sent: Thursday, 16 July, 2009 16:23:57 >> Subject: Re: speed of BooleanQueries on 2.9 >>

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread eks dev
a-user@lucene.apache.org > Sent: Thursday, 16 July, 2009 16:23:57 > Subject: Re: speed of BooleanQueries on 2.9 > > Super, thanks for testing! > > And, the 10% speedup overall is good progress... > > Mike > > On Thu, Jul 16, 2009 at 9:16 AM, eks devwrote: > > > > and o

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread Michael McCandless
rom: eks dev >> To: java-user@lucene.apache.org >> Sent: Thursday, 16 July, 2009 14:40:26 >> Subject: Re: speed of BooleanQueries on 2.9 >> >> >> ok new facts, less chaos :) >> >> - LUCENE-1744 fixed it definitely; I have it confirmed >> Also

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread eks dev
up you will hear from me... Thanks again to all. Cheers, Eks - Original Message > From: eks dev > To: java-user@lucene.apache.org > Sent: Thursday, 16 July, 2009 14:40:26 > Subject: Re: speed of BooleanQueries on 2.9 > > > ok new facts, less chaos :) >

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread eks dev
0.28227 ZIPS:berien^0.25947002 ZIPS:berling^0.23232001 ZIPS:perlin^0.2615))^1.2) Thanks! - Original Message > From: Michael McCandless > To: java-user@lucene.apache.org > Sent: Thursday, 16 July, 2009 13:52:06 > Subject: Re: speed of BooleanQueries on 2.9 > > On Thu, J

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread Michael McCandless
On Thu, Jul 16, 2009 at 6:38 AM, eks dev wrote: > and this String has exactly that form > (x OR y OR z) OR (a OR b OR c), > That is exactly how I construct the Query, have a look at brackets on this > toString result . Duh! OK, I had missed that your large query actually had 2 clauses at the to

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread eks dev
his query stuck... later today I will rollback trunk to see which patch fixed it ... reduces number of puzzle peaces Cheers, Eks - Original Message > From: Michael McCandless > To: java-user@lucene.apache.org > Sent: Thursday, 16 July, 2009 11:47:34 > Subjec

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread Michael McCandless
On Thu, Jul 16, 2009 at 5:21 AM, eks dev wrote: > Trace taken on trunk version (with fixed Yonik's bug and LUCENE-1744 tha > fixed the problem somehow) Whoa, so LUCENE-1744 did in fact fix the problem? (I thought you had accidentally failed to setAllowDocsOutOfOrder(true) and that made us false

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread eks dev
Trace taken on trunk version (with fixed Yonik's bug and LUCENE-1744 tha fixed the problem somehow) full trace is too big (3.5Mb for this list), therefore only beginning and end: Query: +(((NAME:maria NAME:marae^0.25171682 NAME:marai^0.2365632 NAME:marao^0.2365632 NAME:marau^0.2365632 NAME:mar

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
Michael McCandless > To: java-user@lucene.apache.org > Sent: Thursday, 16 July, 2009 1:32:21 > Subject: Re: speed of BooleanQueries on 2.9 > > On Wed, Jul 15, 2009 at 7:13 PM, eks devwrote: > > >>Are you sure when you ran the test you called > >> setAllowDocsOutO

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
On Wed, Jul 15, 2009 at 7:13 PM, eks dev wrote: >>Are you sure when you ran the test you called >> setAllowDocsOutOfOrder(true)? > > right, just a second this is static... we have two indices, something > runs first and sets it to false... ouch, I hate statics... they make you > beleive you

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
warmduscher :) good night - Original Message > From: Uwe Schindler > To: java-user@lucene.apache.org > Sent: Thursday, 16 July, 2009 1:06:30 > Subject: RE: speed of BooleanQueries on 2.9 > > Same here, too late! Good night! > And the blood glucose level is ver

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
e in a couple of minutes. - Original Message > From: Michael McCandless > To: java-user@lucene.apache.org > Sent: Thursday, 16 July, 2009 0:50:28 > Subject: Re: speed of BooleanQueries on 2.9 > > I think that query should rewrite to a BQ that would in turn use BS.

RE: speed of BooleanQueries on 2.9

2009-07-15 Thread Uwe Schindler
to:luc...@mikemccandless.com] > Sent: Thursday, July 16, 2009 12:59 AM > To: java-user@lucene.apache.org > Subject: Re: speed of BooleanQueries on 2.9 > > On Wed, Jul 15, 2009 at 6:52 PM, eks dev wrote: > > > Also not really expected, but this query runs over BS2, shouldn't  

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
On Wed, Jul 15, 2009 at 6:52 PM, eks dev wrote: > Also not really expected, but this query runs over BS2, shouldn't  +( > whatewer whatever1...)  run as BS? what does it mean to have MUST +() at the > top level? Your query is +(((X Y Z))^2). In BQ.rewrite, any single-clause query that hasn't h

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
ssage > From: Uwe Schindler > To: java-user@lucene.apache.org; yo...@lucidimagination.com > Sent: Thursday, 16 July, 2009 0:35:25 > Subject: RE: speed of BooleanQueries on 2.9 > > There is also this one: https://issues.apache.org/jira/browse/LUCENE-1744 > >

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
rom yesterday, the same symptoms like yesterday... >> >> Mike's instrumented version is running ... >> >> >> >> - Original Message >> > From: Yonik Seeley >> > To: java-user@lucene.apache.org >> > Sent: Wednesday, 15 Jul

RE: speed of BooleanQueries on 2.9

2009-07-15 Thread Uwe Schindler
om] On Behalf Of Yonik > Seeley > Sent: Thursday, July 16, 2009 12:06 AM > To: java-user@lucene.apache.org > Subject: Re: speed of BooleanQueries on 2.9 > > On Wed, Jul 15, 2009 at 5:57 PM, eks dev wrote: > > it works with current trunk, 10 Minutes ago built?! > >

RE: speed of BooleanQueries on 2.9

2009-07-15 Thread Uwe Schindler
om] On Behalf Of Yonik > Seeley > Sent: Thursday, July 16, 2009 12:06 AM > To: java-user@lucene.apache.org > Subject: Re: speed of BooleanQueries on 2.9 > > On Wed, Jul 15, 2009 at 5:57 PM, eks dev wrote: > > it works with current trunk, 10 Minutes ago built?! > > Hmmm,

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Yonik Seeley
g >> Sent: Wednesday, 15 July, 2009 23:34:29 >> Subject: Re: speed of BooleanQueries on 2.9 >> >> On Wed, Jul 15, 2009 at 4:37 PM, Uwe Schindlerwrote: >> > And the fix only affects custom DocIdSetIterators. >> >> And custom Queries (via Scorer) since Scorer

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
)^2.0) - Original Message > From: eks dev > To: java-user@lucene.apache.org; yo...@lucidimagination.com > Sent: Wednesday, 15 July, 2009 23:57:22 > Subject: Re: speed of BooleanQueries on 2.9 > > > > it works with current trunk, 10 Minutes ago built?! >

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
2009 23:34:29 > Subject: Re: speed of BooleanQueries on 2.9 > > On Wed, Jul 15, 2009 at 4:37 PM, Uwe Schindlerwrote: > > And the fix only affects custom DocIdSetIterators. > > And custom Queries (via Scorer) since Scorer inherits from DISI. > But as Mike says, it should

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Yonik Seeley
On Wed, Jul 15, 2009 at 4:37 PM, Uwe Schindler wrote: > And the fix only affects custom DocIdSetIterators. And custom Queries (via Scorer) since Scorer inherits from DISI. But as Mike says, it shouldn't be the issue behind in this thread. -Yonik http://www.lucidimagination.com --

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
I do, but not on this Query... the same happens when I use Luke - Original Message > From: Uwe Schindler > To: java-user@lucene.apache.org > Sent: Wednesday, 15 July, 2009 22:37:04 > Subject: RE: speed of BooleanQueries on 2.9 > > And the fix only affects custom

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
- Original Message >>> From: Michael McCandless >>> To: java-user@lucene.apache.org >>> Sent: Wednesday, 15 July, 2009 20:54:25 >>> Subject: Re: speed of BooleanQueries on 2.9 >>> >>> On Wed, Jul 15, 2009 at 2:30 PM, eks devwrote: >>

RE: speed of BooleanQueries on 2.9

2009-07-15 Thread Uwe Schindler
...@thetaphi.de > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Wednesday, July 15, 2009 10:25 PM > To: java-user@lucene.apache.org; yo...@lucidimagination.com > Subject: Re: speed of BooleanQueries on 2.9 > > I just committ

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
;> From: Michael McCandless >> To: java-user@lucene.apache.org >> Sent: Wednesday, 15 July, 2009 20:54:25 >> Subject: Re: speed of BooleanQueries on 2.9 >> >> On Wed, Jul 15, 2009 at 2:30 PM, eks devwrote: >> > >> >> Weird.  Have you run Check

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
I just committed Uwe's fix for that (thanks Uwe!), but I don't think it's causing eks' slowdown because eks' case is a straight OR query, which doesn't use advance. Mike On Wed, Jul 15, 2009 at 3:23 PM, Yonik Seeley wrote: > Could this perhaps have anything to do with the changes to DocIdSetItera

RE: speed of BooleanQueries on 2.9

2009-07-15 Thread Uwe Schindler
alf Of Yonik > Seeley > Sent: Wednesday, July 15, 2009 9:23 PM > To: java-user@lucene.apache.org > Subject: Re: speed of BooleanQueries on 2.9 > > Could this perhaps have anything to do with the changes to > DocIdSetIterator? > Glancing at the default implementation o

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
> If I make a patch that adds verbosity to what BS is doing, can you run > it & post the output? can do, it can take some time - Original Message > From: Michael McCandless > To: java-user@lucene.apache.org > Sent: Wednesday, 15 July, 2009 20:54:25 >

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Yonik Seeley
Could this perhaps have anything to do with the changes to DocIdSetIterator? Glancing at the default implementation of advance makes me wince a bit: public int advance(int target) throws IOException { while (nextDoc() < target) {} return doc; } IMO, this is a back-compatibility anti-pa

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
On Wed, Jul 15, 2009 at 2:30 PM, eks dev wrote: > >> Weird.  Have you run CheckIndex? > nope, I guess it brings nothing: two times built index; Bug provoked by > changing one parameter  that controls only search caused it => no corrupt > index? > > You think we should give it a try? Hell, why not

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
t; >> > I do not know exactly why, but >> >> > when I BooleanQuery.setAllowDocsOutOfOrder(true); I have the problem, >> >> > but with setAllowDocsOutOfOrder(false);  no problems whatsoever >> >> > >> >> > not really scientific method to find such bug,

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
r freq. terms, lower... everything fine... bizzar - Original Message > From: Michael McCandless > To: java-user@lucene.apache.org > Sent: Wednesday, 15 July, 2009 19:57:09 > Subject: Re: speed of BooleanQueries on 2.9 > > OK thanks for the updates. Yes, we are o

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Paul Elschot
tsoever > >> > > >> > not really scientific method to find such bug, but does the job and > >> > makes me happy. > >> > > >> > Empirical, "deprecated methods are not to be taken as thoroughly tested, > >> > as they have short

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
ard, but please stay with me, we will fix one ugly bug :) > > > > > > > > - Original Message ---- >> From: Michael McCandless >> To: java-user@lucene.apache.org >> Sent: Wednesday, 15 July, 2009 19:27:24 >> Subject: Re: speed of BooleanQueries on 2

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
- Original Message > From: Michael McCandless > To: java-user@lucene.apache.org > Sent: Wednesday, 15 July, 2009 19:30:42 > Subject: Re: speed of BooleanQueries on 2.9 > > On Wed, Jul 15, 2009 at 11:41 AM, eks devwrote: > > >

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
rg > Sent: Wednesday, 15 July, 2009 19:27:24 > Subject: Re: speed of BooleanQueries on 2.9 > > But, that query can't accept a minNumberShouldMatch -- are you really > setting that? (You get 0 results if you set it, because the top > boolean query has a single required clause).

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
On Wed, Jul 15, 2009 at 11:41 AM, eks dev wrote: > You see it on stack trace taken while "stuck" > o.a.l.search.TopScoreDocCollector$OutOfOrderTopScoreDocCollector.collect(UnknownSource) Is it possible for you to make the problem happen such that we get line numbers in this traceback? Is CPU pe

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
073482 NAME:plocharski^0.21168004 NAME:pokarski^0.20172001 > NAME:polikarski^0.20172001 NAME:pukarski^0.20172001 NAME:pyekarska^0.26508 > NAME:siekarski^0.20281483))^2.0) > > > > > > - Original Message >> From: Michael McCandless >> To: java-user@lucene.apa

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
; > Empirical, "deprecated methods are not to be taken as thoroughly tested, >> > as they have short life expectancy" >> > >> > >> > >> > >> > >> > - Original Message >> >> From: eks dev >> >> T

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Paul Elschot
, as > > they have short life expectancy" > > > > > > > > > > > > ----- Original Message ---- > >> From: eks dev > >> To: java-user@lucene.apache.org > >> Sent: Wednesday, 15 July, 2009 0:24:43 > >> Subject: Re: s

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
0.20172001 NAME:pyekarska^0.26508 NAME:siekarski^0.20281483))^2.0) - Original Message > From: Michael McCandless > To: java-user@lucene.apache.org > Sent: Wednesday, 15 July, 2009 17:16:23 > Subject: Re: speed of BooleanQueries on 2.9 > > So now I'm confused. Since yo

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
On Tue, Jul 14, 2009 at 6:24 PM, eks dev wrote: > org.apache.lucene.search.TopScoreDocCollector$OutOfOrderTopScoreDocCollector.collect(Unknown > Source) > org.apache.lucene.search.BooleanScorer.score(Unknown Source) > org.apache.lucene.search.BooleanScorer.score(Unknown Source) > org.apache.lucen

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
t;> To: java-user@lucene.apache.org >> Sent: Wednesday, 15 July, 2009 0:24:43 >> Subject: Re: speed of BooleanQueries on 2.9 >> >> >> Mike, we are definitely hitting something with this one! >> >> we had report from our QA chaps that our servers got stuck

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
he.org > Sent: Wednesday, 15 July, 2009 13:30:22 > Subject: Re: speed of BooleanQueries on 2.9 > > On Tue, Jul 14, 2009 at 7:04 PM, eks devwrote: > > > > I do not know exactly why, but > > when I BooleanQuery.setAllowDocsOutOfOrder(true); I have the problem, but >

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
On Tue, Jul 14, 2009 at 7:04 PM, eks dev wrote: > > I do not know exactly why, but > when I BooleanQuery.setAllowDocsOutOfOrder(true); I have the problem, but > with setAllowDocsOutOfOrder(false);  no problems whatsoever That toggles between using BooleanScorer vs BooleanScorer2. The odd thing i

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
ssage >> From: eks dev >> To: java-user@lucene.apache.org >> Sent: Wednesday, 15 July, 2009 0:24:43 >> Subject: Re: speed of BooleanQueries on 2.9 >> >> >> Mike, we are definitely hitting something with this one! >> >> we had report from ou

Re: speed of BooleanQueries on 2.9

2009-07-14 Thread eks dev
ot to be taken as thoroughly tested, as they have short life expectancy" - Original Message > From: eks dev > To: java-user@lucene.apache.org > Sent: Wednesday, 15 July, 2009 0:24:43 > Subject: Re: speed of BooleanQueries on 2.9 > > > Mike, we are definit

Re: speed of BooleanQueries on 2.9

2009-07-14 Thread eks dev
earch(Unknown Source) org.apache.lucene.search.Searcher.search(Unknown Source) - Original Message > From: eks dev > To: java-user@lucene.apache.org > Sent: Monday, 13 July, 2009 13:28:45 > Subject: Re: speed of BooleanQueries on 2.9 > > Hi Mike, > > getMa

Re: speed of BooleanQueries on 2.9

2009-07-13 Thread eks dev
Subject: Re: speed of BooleanQueries on 2.9 > > This is not expected; 2.9 has had a number of changes that ought to > reduce CPU cost of searching. If this holds up we definitely need to > get to the root cause. > > Did your test exclude the warmup query for both 2.4.1 & 2.9?

Re: speed of BooleanQueries on 2.9

2009-07-13 Thread Michael McCandless
This is not expected; 2.9 has had a number of changes that ought to reduce CPU cost of searching. If this holds up we definitely need to get to the root cause. Did your test exclude the warmup query for both 2.4.1 & 2.9? How many segments in the index? What is the actual value of getMaxNumOfCan