Hi Markus and Sethi, Thank you for your reply, I was stuck and tried the following and it works for me: bin/nutch solrindex http://127.0.0.1:8983/solr/crawl/crawldb crawl/linkdb crawl/segments/*
May I know where is the lucene index directory for the /crawl folder? I would like to use Luke-Lucene Index Toolbox to open the index. On the sidenote, I think Nutch 1.2 is way much better than Nutch 1.3. Nutch 1.2 autocreates Lucene index, does not need solr, and has more functions. Why in the first place is the lucene index removed from Nutch 1.3? ________________________________ From: "Sethi, Parampreet" <[email protected]> To: "[email protected]" <[email protected]>; Kelvin <[email protected]> Sent: Tuesday, 19 July 2011 11:51 PM Subject: Re: SolrDeleteDuplicates error Hey Kelvin, The issue is with the Solr version incompatibility in Nutch 1.3 and the Solr server that you are running. Remove the SolrjClient jar from nutch/runtime/local/lib folder and copy the Solrj client jar from your Solr installation to it. This will solve your problem. (I was facing the same error yesterday and this fix resolved it =) ) I am creating notes about how I setup Nutch with Solr and what issues I faced with solutions at my blog http://param-techie.blogspot.com/2011/07/nutch-13-and-solr-integration.html Hope it Helps! -param On 7/19/11 11:28 AM, "Markus Jelsma" <[email protected]> wrote: > The solrdedup job completes without failure, it is the solrindex job that's > actually failing. See your hadoop.log and check Solr's output. > > On Tuesday 19 July 2011 17:23:51 Kelvin wrote: >> Sorry for the multiple postings. I am trying out nutch 1.3, which requires >> solr for indexing >> >> I try to crawl and index with solr with this simple command >> bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 10 >> >> But why does it gives me the following error? Thank you for your kind help >> >> >> SolrIndexer: starting at 2011-07-19 23:13:31 >> java.io.IOException: Job failed! >> SolrDeleteDuplicates: starting at 2011-07-19 23:13:33 >> SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/ >> SolrDeleteDuplicates: finished at 2011-07-19 23:13:34, elapsed: 00:00:01 >> crawl finished: crawl-20110719231304

