Re: Any tips for indexing large amounts of data?

2007-11-02 Thread Brendan Grainger
Thanks so much for your suggestions. I am attempting to index 550K docs at once, but have found I've had to break them up into smaller batches. Indexing seems to stop at around 47K docs (the index reaches 264M in size at this point). The index eventually itself grows to about 2Gb. I am

Re: Phrase Query Performance Question

2007-11-02 Thread Walter Underwood
He means extremely frequent and I agree. --wunder On 11/2/07 1:51 AM, Haishan Chen [EMAIL PROTECTED] wrote: Thanks for the advice. You certainly have a point. I believe you mean a query term that appears in 5-10% of an index in a natural language corpus is extremely INFREQUENT?

Solr and Lucene Indexing Performance

2007-11-02 Thread Jae Joo
Hi, I have 6 millions article to be indexed by Solr and do need your recommendation. I do need to parse and generate the Solr based xml file to post it. How about to use Lucene directly? I have short testing, it looks like Sola based indexing is faster than direct indexing through Lucene. Am I

RE: Phrase Query Performance Question

2007-11-02 Thread Haishan Chen
From: [EMAIL PROTECTED] Subject: Re: Phrase Query Performance Question Date: Thu, 1 Nov 2007 11:25:26 -0700 To: solr-user@lucene.apache.org On 31-Oct-07, at 11:54 PM, Haishan Chen wrote:Date: Wed, 31 Oct 2007 17:54:53 -0700 Subject: Re: Phrase Query Performance Question From:

Re: Phrase Query Performance Question

2007-11-02 Thread Mike Klaas
On 2-Nov-07, at 10:03 AM, Haishan Chen wrote: Date: Fri, 2 Nov 2007 07:32:30 -0700 Subject: Re: Phrase Query Performance Question From: [EMAIL PROTECTED] To: solr- [EMAIL PROTECTED] He means extremely frequent and I agree. --wunder Then it means a PHRASE (combination of terms

Re: Solr and Lucene Indexing Performance

2007-11-02 Thread Mike Klaas
On 2-Nov-07, at 11:41 AM, Jae Joo wrote: Hi, I have 6 millions article to be indexed by Solr and do need your recommendation. I do need to parse and generate the Solr based xml file to post it. How about to use Lucene directly? I have short testing, it looks like Sola based indexing is

Re: Phrase Query Performance Question

2007-11-02 Thread Chris Hostetter
: It still feels to me that you are trying doing something unique with your : phrase queries. Unfortunately, you still haven't said what you are trying to : do in general terms, which makes it very difficult for people to help you. Agreed. This seems very special case, but we dont' know what

RE: Phrase Query Performance Question

2007-11-02 Thread Haishan Chen
Date: Fri, 2 Nov 2007 12:31:29 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Re: Phrase Query Performance Question : It still feels to me that you are trying doing something unique with your : phrase queries. Unfortunately, you still haven't said what you are

Re: Solr production live implementation

2007-11-02 Thread Otis Gospodnetic
Hi Tim (switching to the more appropriate solr-user list) It's hard to tell and depends on thing like integration of search in the rest of the site, the placement of search field/form, the exposure, etc. The corpus/index does not sound large, but the mention of Windows scares me, as does 2GB