Hello everyone,

 

I am new to nutch and am having a problem with my initial deployment of
it. It does not seem that nutch is properly parsing the SEGMENT string
and is trying to search invalid folders. I am using Ubuntu Server 10.10
with Nutch 1.1. I am wondering if this might be something to do with
Ubuntu. Any Ideas?

 

If I do an echo $SEGMENT I get :       "crawl/segments/ls -tr
crawl/segments|tail -1"

 

r...@nutchmaster:/usr/share/nutch# export SEGMENT=crawl/segments/'ls -tr
crawl/segments|tail -1'

r...@nutchmaster:/usr/share/nutch# bin/nutch fetch $SEGMENT -noParsing

Fetcher: starting

Fetcher: segment: crawl/segments/ls

Fetcher: org.apache.hadoop.mapred.InvalidInputException: Input path does
not exist:
file:/usr/share/apache-nutch-1.1-bin/crawl/segments/ls/crawl_generate

        at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java
:190)

        at
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFile
InputFormat.java:44)

        at
org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:104)

        at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)

        at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)

        at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)

        at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)

        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1104)

        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1140)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1111)

Reply via email to