And there's an open issue where this sort of feature can be
contributed: https://issues.apache.org/jira/browse/SOLR-903
Though in that issue there are a two different approaches mentioned,
one being purely SolrJ client-side (my original intention in opening
the issue), but also what Mark mentions on the server side to allow
update requests to be logged completely.
Erik
On Apr 8, 2010, at 11:45 AM, Rich Cariens wrote:
Thanks Mark. That's sort of what I was thinking of doing.
On Thu, Apr 8, 2010 at 10:33 AM, Mark Miller <markrmil...@gmail.com>
wrote:
On 04/08/2010 09:23 AM, Rich Cariens wrote:
Are there any best practices or built-in support for keeping track
of
what's
been indexed in a Solr application so as to support a full
rebuild? I'm
not
indexing from a single source, but from many, sometimes arbitrary,
sources
including:
1. A document repository that fires events (containing a URL)
when new
documents are added to the repo;
2. A book-marking service that fires events containing URLs when
users
of
that service bookmark a URL;
3. More services that raise events that make Solr update docs
indexed
via
(1) or (2) with additional metadata (think user comments, tagging,
etc).
I'm looking at ~200M documents for the initial launch, with around
30K new
docs every day, and many thousands of metadata events every day.
Do any of you Solr gurus have any suggestions or guidance you can
share
with
me?
Thanks in advance,
Rich
Pump everything through an UpdateProcessor that writes out SolrXML
as docs
go by?
--
- Mark
http://www.lucidimagination.com