Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
Hi, we use Lucene to store around 300 millions of records. We use the index both for conventional searching, but also for all the system's data - we replaced MySQL with Lucene because it was simply not working at all with MySQL due to the amount or records. Our problem is that we have HUGE

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Shai Erera
Hi First you can use MatchAllDocsQuery, which matches all documents. It will save a HUGE posting list (TAG:TAG), and performs much faster. For example TAG:TAG computes a score for each doc, even though you don't need it. MatchAllDocsQuery doesn't. Second, move away from Hits ! :) Use Collectors

RE: Performance problems with Lucene 2.9

2009-11-30 Thread Uwe Schindler
.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Shai Erera [mailto:ser...@gmail.com] Sent: Monday, November 30, 2009 4:56 PM To: java-user@lucene.apache.org Subject: Re: Performance problems with Lucene 2.9 Hi First you

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
Hi ! Thanks so much !! * I'll check the documentation for MatchAllDocsQuery. * I'm already changing my code to create BooleanQueries instead of filters - is that better than MatchAllDocsQuery or it's the same? * Is using MatchAllDocsQuery the only way to disable scoring? * Would you have any

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
Erera [mailto:ser...@gmail.com] Sent: Monday, November 30, 2009 4:56 PM To: java-user@lucene.apache.org Subject: Re: Performance problems with Lucene 2.9 Hi First you can use MatchAllDocsQuery, which matches all documents. It will save a HUGE posting list (TAG:TAG), and performs much

RE: Performance problems with Lucene 2.9

2009-11-30 Thread Uwe Schindler
Subject: Re: Performance problems with Lucene 2.9 What is the main difference between Hits and Collectors? - Mike aka...@gmail.com On Mon, Nov 30, 2009 at 11:03 AM, Uwe Schindler u...@thetaphi.de wrote: And if you only have a filter and apply it to all documents, make

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
: u...@thetaphi.de -Original Message- From: Michel Nadeau [mailto:aka...@gmail.com] Sent: Monday, November 30, 2009 5:10 PM To: java-user@lucene.apache.org Subject: Re: Performance problems with Lucene 2.9 What is the main difference between Hits and Collectors? - Mike

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Erick Erickson
://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Shai Erera [mailto:ser...@gmail.com] Sent: Monday, November 30, 2009 4:56 PM To: java-user@lucene.apache.org Subject: Re: Performance problems with Lucene 2.9 Hi First you can use MatchAllDocsQuery

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
problems with Lucene 2.9 Hi First you can use MatchAllDocsQuery, which matches all documents. It will save a HUGE posting list (TAG:TAG), and performs much faster. For example TAG:TAG computes a score for each doc, even though you don't need it. MatchAllDocsQuery doesn't

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Ian Lea
http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Shai Erera [mailto:ser...@gmail.com] Sent: Monday, November 30, 2009 4:56 PM To: java-user@lucene.apache.org Subject: Re: Performance problems with Lucene 2.9 Hi First you can

RE: Performance problems with Lucene 2.9

2009-11-30 Thread Uwe Schindler
To: java-user@lucene.apache.org Subject: Re: Performance problems with Lucene 2.9 I'll definitely switch to a Collector. It's just not clear for me if I should use BooleanQueries or MatchAllDocuments+Filters ? And should I write my own collector or the TopDocs one is perfect for me

RE: Performance problems with Lucene 2.9

2009-11-30 Thread Uwe Schindler
-Original Message- From: Shai Erera [mailto:ser...@gmail.com] Sent: Monday, November 30, 2009 4:56 PM To: java-user@lucene.apache.org Subject: Re: Performance problems with Lucene 2.9 Hi First you can use MatchAllDocsQuery, which matches all documents

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
, 2009 5:35 PM To: java-user@lucene.apache.org Subject: Re: Performance problems with Lucene 2.9 I'll definitely switch to a Collector. It's just not clear for me if I should use BooleanQueries or MatchAllDocuments+Filters ? And should I write my own collector or the TopDocs

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
-Original Message- From: Michel Nadeau [mailto:aka...@gmail.com] Sent: Monday, November 30, 2009 5:35 PM To: java-user@lucene.apache.org Subject: Re: Performance problems with Lucene 2.9 I'll definitely switch to a Collector. It's just not clear for me if I

RE: Performance problems with Lucene 2.9

2009-11-30 Thread Uwe Schindler
- From: Michel Nadeau [mailto:aka...@gmail.com] Sent: Monday, November 30, 2009 5:35 PM To: java-user@lucene.apache.org Subject: Re: Performance problems with Lucene 2.9 I'll definitely switch to a Collector. It's just not clear for me if I should use BooleanQueries

RE: Performance problems with Lucene 2.9

2009-11-30 Thread Uwe Schindler
Message- From: Michel Nadeau [mailto:aka...@gmail.com] Sent: Monday, November 30, 2009 5:35 PM To: java-user@lucene.apache.org Subject: Re: Performance problems with Lucene 2.9 I'll definitely switch to a Collector. It's just not clear for me if I should use

RE: Performance problems with Lucene 2.9

2009-11-30 Thread Uwe Schindler
PM To: java-user@lucene.apache.org Subject: Re: Performance problems with Lucene 2.9 The problem with this method is that I won't be able to know how many total results / pages a search have? For example if I do a search X that returns 1,000,000 records, so 5,000 pages of 200 items, I