Hi,
I am a brand-new user to Nutch and Solr. I've been trying to install both
programs and integrate them. I followed these two tutorials:
http://wiki.apache.org/nutch/RunningNutchAndSolr
http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html
The installation of Solr was successful, because I was able to run 'java
-jar start.jar' to start indexing.
However, when I tried to run Nutch by './bin/crawl.sh crawl.s', I got this
error:
*Injector: starting
Injector: crawlDb: crawl.s/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawl.s/segments/20081225010322
Generator: filtering: true
Generator: topN: 1000
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
processing segment drwxr-xr-x - tomcat tomcat 4096 2008-12-23 00:38
/opt/tomcat6/nutch/crawl.s/segments/20081223003839
Fetcher: starting
Fetcher: segment: drwxr-xr-x
Fetcher: java.io.IOException: Segment already fetched!
at
org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs(FetcherOutputFormat.java:50)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:778)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1127)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:531)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:566)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:538)
Command exited with abnormal status, bailing out.*
I don't know where is wrong. Could someone on the list help me out? thanks!
--
Signature: Success is a journey that never ends.