Hi, On 6/26/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Is this actually planned (addition of SolrIndexer to Nutch)? > A search for SolrIndexer in JIRA got no hits.
There is NUTCH-442 (one of the most popular issues). But, after Sami's work, there have been no further developments. I think Sami Siren's original patch no longer works with Solr, I am not sure if it still applies to nutch. So, if anyone wants to tackle this, here are a couple of items off the top of my mind: 1) Bring Sami's patch up-to-date (both with solr and with nutch). I think a seperate Indexer job is unnecessary, we should just change Indexer.OutputFormat to check for a parameter, and if its true, OutputFormat should also send documents to Solr (besides writing it to lucene index in DFS). 2) Make it work in distributed setups (i.e. with more than 1 index server) . Sami Siren also makes a note of this, but I don't believe that a simple hash-the-url approach is appropriate for nutch. It would be nice to guarantee that a url always goes to the same indexing server, even if we add or remove index servers (if we just take the hash of url, then adding a new machine would cause pretty much all urls to be distributed to different servers). 3) We need to code a SolrSearcher similar to o.a.n.s.IndexSearcher(so that Solr is a drop-in replacement for IndexSearcher). This class should handle stuff like generating summaries, etc. This one is easy (if a bit boring:). If anyone is interested, I would be glad to help him/her with the nutch side of things. I also would like to work on it, but I don't have time right now. > > Otis > > > ----- Original Message ---- > From: Brian Whitman <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Saturday, June 23, 2007 4:13:02 PM > Subject: Re: [Nutch-general] Integrate nutch crawler with Solr index server > > > On Jun 23, 2007, at 8:37 AM, David Xiao wrote: > > As title said, I have some difficult to integrate them together. I > > tried to followed instruction at http://blog.foofactory.fi/2007/02/ > > online-indexing-integrating-nutch-with.html but I don't actually > > understand part that java piece of code. In article it doesn't go > > detail configuration of Solr. I have download solr-client.zip but > > what to do with Nutch? > > > It's my understanding that the code Sami posted will no longer work > with recent versions of Solr / solrj. > > However, the solr client (SOLR-20) was recently added to trunk, > http://issues.apache.org/jira/browse/SOLR-20#action_12505314 , I sent > Sami a patch on his posted code and hopefully we'll see SolrIndexer > get into Nutch trunk sometime soon? > > As far as configuration of Solr, that post does a good job at > explaining it, there's not much to it- just use the schema he posted > and start Solr normally. > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Nutch-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/nutch-general > > > > -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
