Hi,

On 6/26/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Is this actually planned (addition of SolrIndexer to Nutch)?
> A search for SolrIndexer in JIRA got no hits.

There is NUTCH-442 (one of the most popular issues). But, after Sami's
work, there have been no further developments.

I think Sami Siren's original patch no longer works with Solr, I am
not sure if it still applies to nutch. So, if anyone wants to tackle
this, here are a couple of items off the top of my mind:

1) Bring Sami's patch up-to-date (both with solr and with nutch). I
think a seperate Indexer job is unnecessary, we should just change
Indexer.OutputFormat to check for a parameter, and if its true,
OutputFormat should also send documents to Solr (besides writing it to
lucene index in DFS).

2) Make it work in distributed setups (i.e. with more than 1 index
server)  . Sami Siren also makes a note of this, but I don't believe
that a simple hash-the-url approach is appropriate for nutch. It would
be nice to guarantee that a url always goes to the same indexing
server, even if we add or remove index servers (if we just take the
hash of url, then adding a new machine would cause pretty much all
urls to be distributed to different servers).

3) We need to code a SolrSearcher similar to o.a.n.s.IndexSearcher(so
that Solr is a drop-in replacement for IndexSearcher). This class
should handle stuff like generating summaries, etc. This one is easy
(if a bit boring:).

If anyone is interested, I would be glad to help him/her with the
nutch side of things. I also would like to work on it, but I don't
have time right now.

>
> Otis
>
>
> ----- Original Message ----
> From: Brian Whitman <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Saturday, June 23, 2007 4:13:02 PM
> Subject: Re: [Nutch-general] Integrate nutch crawler with Solr index server
>
>
> On Jun 23, 2007, at 8:37 AM, David Xiao wrote:
> > As title said, I have some difficult to integrate them together. I
> > tried to followed instruction at http://blog.foofactory.fi/2007/02/
> > online-indexing-integrating-nutch-with.html but I don't actually
> > understand part that java piece of code. In article it doesn't go
> > detail configuration of Solr. I have download solr-client.zip but
> > what to do with Nutch?
>
>
> It's my understanding that the code Sami posted will no longer work
> with recent versions of Solr / solrj.
>
> However, the solr client (SOLR-20) was recently added to trunk,
> http://issues.apache.org/jira/browse/SOLR-20#action_12505314 , I sent
> Sami a patch on his posted code and hopefully we'll see SolrIndexer
> get into Nutch trunk sometime soon?
>
> As far as configuration of Solr, that post does a good job at
> explaining it, there's not much to it- just use the schema he posted
> and start Solr normally.
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Nutch-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/nutch-general
>
>
>
>


-- 
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to