On 04/08/2010 09:23 AM, Rich Cariens wrote:
Are there any best practices or built-in support for keeping track of what's
been indexed in a Solr application so as to support a full rebuild?  I'm not
indexing from a single source, but from many, sometimes arbitrary, sources
including:

    1. A document repository that fires events (containing a URL) when new
    documents are added to the repo;
    2. A book-marking service that fires events containing URLs when users of
    that service bookmark a URL;
    3. More services that raise events that make Solr update docs indexed via
    (1) or (2) with additional metadata (think user comments, tagging, etc).

I'm looking at ~200M documents for the initial launch, with around 30K new
docs every day, and many thousands of metadata events every day.

Do any of you Solr gurus have any suggestions or guidance you can share with
me?

Thanks in advance,
Rich


Pump everything through an UpdateProcessor that writes out SolrXML as docs go by?

--
- Mark

http://www.lucidimagination.com



Reply via email to