Hi, looks like the segment is not "addressed" properly:
hdfs://localhost:54310/user/hduser/TestCrawl/segments/crawl_generate Segments are named by a time-stamp, e.g. .../TestCrawl/segments/20140502231126/ "crawl_generate" is a subdir. Can you specify the exact commands to run the crawler? Sebastian On 05/03/2014 08:30 PM, BlackIce wrote: > Hi, > > what needs to be copyied over to the HDFS in Nutch 1.8? or what is the > command? when trying to run the crawl script under /runtime/deploy I get > the following: > > 14/05/03 14:59:03 INFO fetcher.Fetcher: Fetcher: starting at 2014-05-03 > 14:59:03 > 14/05/03 14:59:03 INFO fetcher.Fetcher: Fetcher: segment: TestCrawl/segments > 14/05/03 14:59:03 INFO fetcher.Fetcher: Fetcher Timelimit set for : > 1399132743190 > 14/05/03 14:59:03 INFO mapred.JobClient: Cleaning up the staging area > hdfs://localhost:54310/home/hduser/tmp/mapred/staging/hduser/.staging/job_201405031455_0005 > 14/05/03 14:59:03 ERROR security.UserGroupInformation: > PriviledgedActionException as:hduser > cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not > exist: hdfs://localhost:54310/user/hduser/TestCrawl/segments/crawl_generate > 14/05/03 14:59:03 ERROR fetcher.Fetcher: Fetcher: > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > hdfs://localhost:54310/user/hduser/TestCrawl/segments/crawl_generate > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197) > at > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:40) > at > org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:106) > at > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081) > at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073) > at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) > at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1340) > at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1376) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1349) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:160) >