Best practice advice needed!

2008-09-25 Thread sundar shankar
Hi, We have an index of courses (about 4 million docs in prod) and we have a nightly that would pick up newly added courses and update the index accordingly. There is another Enterprise system that shares the same table and that could delete data from the table too. I just want to know

Re: Best practice advice needed!

2008-09-25 Thread Fuad Efendi
I am guessing your Enterprise system deletes/updates tables in RDBMS, and your SOLR indexes that data. Additionally to that, you have front-end interacting with SOLR and with RDBMS. At front-end level, in case of a search sent to SOLR returning primary keys for data, you may check your

Re: Best practice advice needed!

2008-09-25 Thread Erick Erickson
How long does it take to build the entire index? Can you just rebuild it from scratch every night? That would be the simplest. Best Erick On Thu, Sep 25, 2008 at 12:48 PM, sundar shankar [EMAIL PROTECTED]wrote: Hi, We have an index of courses (about 4 million docs in prod) and we have a

Re: Best practice advice needed!

2008-09-25 Thread Walter Underwood
This will cause the result counts to be wrong and the deleted docs will stay in the search index forever. Some approaches for incremental update: * full sweep garbage collection: fetch every ID in the Solr DB and check whether that exists in the source DB, then delete the ones that don't exist.

Re: Best practice advice needed!

2008-09-25 Thread Walter Underwood
That should be flag it in a boolean column. --wunder On 9/25/08 11:51 AM, Walter Underwood [EMAIL PROTECTED] wrote: This will cause the result counts to be wrong and the deleted docs will stay in the search index forever. Some approaches for incremental update: * full sweep garbage

RE: Best practice advice needed!

2008-09-25 Thread sundar shankar
Great Thanks. Date: Thu, 25 Sep 2008 11:54:32 -0700 Subject: Re: Best practice advice needed! From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org That should be flag it in a boolean column. --wunder On 9/25/08 11:51 AM, Walter Underwood [EMAIL PROTECTED] wrote

RE: Best practice advice needed!

2008-09-25 Thread sundar shankar
Hi Faud, Since I dont have too much of data (4 million) I dont have a master slave setup yet. How big a change would that be? Date: Thu, 25 Sep 2008 10:08:51 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Re: Best practice advice needed! I am guessing your

Re: Best practice advice needed!

2008-09-25 Thread Fuad Efendi
About web spiders: I simply use last modified timestamp field in SOLR, and I expire items after 30 days. If item was updated (timestamp changed) - it won't be deleted. If I delete it from database - it will be deleted from SOLR within 30 days. Spiders don't need 'transactional' updates.