I can do that but it will come after I finish some reqs on the next gen
nutch. :) I do consider shard management to be part of that.
Dennis
Otis Gospodnetic wrote:
And there is http://wiki.apache.org/solr/DistributedSearch , but this talks
*only* about search.
Dennis, are you the man to take what's on DistributedLucene and
DistributedSearch and come up with a marriage proposal? :)
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
From: Andrzej Bialecki <[EMAIL PROTECTED]>
To: [email protected]
Sent: Monday, April 14, 2008 1:01:37 PM
Subject: Re: Next Generation Nutch
Dennis Kubes wrote:
Otis Gospodnetic wrote:
I suppose the first thing to do would be describe the requirements for
this shard management. I imagine you have very specific functionality
in mind from your Wikia Search experience. Mind putting your ideas on
the Wiki? I think it would be very good to share this with
[EMAIL PROTECTED] early on, so we can come up with something general
that fits both Nutch and Solr. It might turn out that this calls for
a separate Lucene project, but we'll see that once the real discussion
starts.
I completely agree. This would be better as a shared project. I will
put my current thoughts down on the Nutch wiki, unless there is already
a discussion going somewhere?
There is a description of a related concept here:
http://wiki.apache.org/hadoop/DistributedLucene . However, this
addresses only the index part of the shard - in our case shards also
contain plain text (for summaries) and the original binary content (for
cached preview), and possibly other parts (NUTCH-466) neither of which
is managed by this code.