Hello everyone,
I am new to nutch and am having a problem with my initial deployment of
it. It does not seem that nutch is properly parsing the SEGMENT string
and is trying to search invalid folders. I am using Ubuntu Server 10.10
with Nutch 1.1. I am wondering if this might be something to do with
Ubuntu. Any Ideas?
If I do an echo $SEGMENT I get : "crawl/segments/ls -tr
crawl/segments|tail -1"
r...@nutchmaster:/usr/share/nutch# export SEGMENT=crawl/segments/'ls -tr
crawl/segments|tail -1'
r...@nutchmaster:/usr/share/nutch# bin/nutch fetch $SEGMENT -noParsing
Fetcher: starting
Fetcher: segment: crawl/segments/ls
Fetcher: org.apache.hadoop.mapred.InvalidInputException: Input path does
not exist:
file:/usr/share/apache-nutch-1.1-bin/crawl/segments/ls/crawl_generate
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java
:190)
at
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFile
InputFormat.java:44)
at
org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:104)
at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1104)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1111)