Re: Big slowdown with phrase queries

2008-07-16 Thread Chris Harris
For the future reference of anyone looking for help on this topic, this thread http://www.nabble.com/Phrase-Query-Performance-Question-td13422183.html discusses a phrase performance problem raised by Haishan Chen in October 2007. If you find this thread helpful you should probably consult that

Re: Big slowdown with phrase queries

2008-07-12 Thread Chris Harris
Just for fun, I ran my cpu profiler against Solr while running a little app to hammer Solr with phrase queries. As Mike and Yonik could have predicted, I found that Solr was spending basically all its time in lucene.index.MultiSegmentReader$MultiTermPositions code (called from

Re: Big slowdown with phrase queries

2008-07-12 Thread Walter Underwood
On 7/12/08 7:00 PM, Chris Harris [EMAIL PROTECTED] wrote: Mike, your idea of indexing bigrams is also interesting. Do you know if any text search platforms do this behind the scenes as their default way of handling phrase queries? Infoseek indexed biwords with their Ultra engine, which lives

Re: Big slowdown with phrase queries

2008-07-03 Thread Yonik Seeley
On Thu, Jul 3, 2008 at 6:04 PM, Chris Harris [EMAIL PROTECTED] wrote: Now I gather that phrase queries are inherently slower than non-phrase queries, but 1-3 orders of magnitude difference seems noteworthy. Phrase queries could be a couple times slower, but normally not to the degree you show

Re: Big slowdown with phrase queries

2008-07-03 Thread Mike Klaas
On 3-Jul-08, at 3:04 PM, Chris Harris wrote: Now I gather that phrase queries are inherently slower than non-phrase queries, but 1-3 orders of magnitude difference seems noteworthy. This is on Solr r654965, which I don't think is *too* far behind the trunk version. 1200Mb RAM allocated to

Re: Big slowdown with phrase queries

2008-07-03 Thread Chris Harris
Ok, I only have one segment right now, so I've got one of each of these: .tis file: 730MB .frq files: 9KB .prx file: 26KB If I'm understanding you (and Mike) properly, then even though it's the prx file that contains the actual position info, you can't get to that info quickly unless the tis

Re: Big slowdown with phrase queries

2008-07-03 Thread Yonik Seeley
On Thu, Jul 3, 2008 at 7:05 PM, Chris Harris [EMAIL PROTECTED] wrote: Ok, I only have one segment right now, so I've got one of each of these: .tis file: 730MB .frq files: 9KB .prx file: 26KB That's pretty much impossible (way too small). Double check those numbers. If I'm understanding

Re: Big slowdown with phrase queries

2008-07-03 Thread Chris Harris
On Thu, Jul 3, 2008 at 4:35 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Thu, Jul 3, 2008 at 7:05 PM, Chris Harris [EMAIL PROTECTED] wrote: Ok, I only have one segment right now, so I've got one of each of these: .tis file: 730MB .frq files: 9KB .prx file: 26KB That's pretty much

Re: Big slowdown with phrase queries

2008-07-03 Thread Mike Klaas
On 3-Jul-08, at 5:13 PM, Chris Harris wrote: That's pretty much impossible (way too small). Double check those numbers. I don't know where I got the above numbers. Sorry. Here are the real numbers: .tis file: 730MB .frq files: 10.1 GB .prx file: 43.2 GB Now keeping all *that* in RAM,