Re: A bug in the crawl secript in Nutch 1.6

Sourajit Basak Wed, 10 Jul 2013 00:18:37 -0700

The dedup stage fails with the following error.

SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/collection5
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.run(SolrDeleteDuplicates.java:390)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.main(SolrDeleteDuplicates.java:395)



On Sat, Jun 22, 2013 at 8:03 AM, Tejas Patil <[email protected]>wrote:

> Thanks Joe for pointing it out. There was a jira [0] for this bug and the
> change is already present in the trunk.
>
> [0] : https://issues.apache.org/jira/browse/NUTCH-1500
>
>
> On Fri, Jun 21, 2013 at 7:11 PM, Joe Zhang <[email protected]> wrote:
>
> > The new crawl script is quite useful. Thanks for the addition.
> >
> > It comes with a bug, though:
> >
> >
> > Line 169:
> >  $bin/nutch solrindex $SOLRURL $CRAWL_PATH/crawldb -linkdb
> > $CRAWL_PATH/linkdb $SEGMENT
> >
> > should be:
> >
> >  $bin/nutch solrindex $SOLRURL $CRAWL_PATH/crawldb -linkdb
> > $CRAWL_PATH/linkdb $CRAWL_PATH/segments/$SEGMENT
> >
> > instead.
> >
>

Re: A bug in the crawl secript in Nutch 1.6

Reply via email to