Re: Nutch 1.8 in pseudo dist error

Sebastian Nagel Sat, 03 May 2014 15:07:27 -0700

Hi,

looks like the segment is not "addressed" properly:


hdfs://localhost:54310/user/hduser/TestCrawl/segments/crawl_generate

Segments are named by a time-stamp, e.g.
   .../TestCrawl/segments/20140502231126/
"crawl_generate" is a subdir.

Can you specify the exact commands to run the crawler?

Sebastian

On 05/03/2014 08:30 PM, BlackIce wrote:
> Hi,
> 
> what needs to be copyied over to the HDFS in Nutch 1.8? or what is the
> command? when trying to run the crawl script under /runtime/deploy I get
> the following:
> 
> 14/05/03 14:59:03 INFO fetcher.Fetcher: Fetcher: starting at 2014-05-03
> 14:59:03
> 14/05/03 14:59:03 INFO fetcher.Fetcher: Fetcher: segment: TestCrawl/segments
> 14/05/03 14:59:03 INFO fetcher.Fetcher: Fetcher Timelimit set for :
> 1399132743190
> 14/05/03 14:59:03 INFO mapred.JobClient: Cleaning up the staging area
> hdfs://localhost:54310/home/hduser/tmp/mapred/staging/hduser/.staging/job_201405031455_0005
> 14/05/03 14:59:03 ERROR security.UserGroupInformation:
> PriviledgedActionException as:hduser
> cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not
> exist: hdfs://localhost:54310/user/hduser/TestCrawl/segments/crawl_generate
> 14/05/03 14:59:03 ERROR fetcher.Fetcher: Fetcher:
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
> hdfs://localhost:54310/user/hduser/TestCrawl/segments/crawl_generate
>     at
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
>     at
> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:40)
>     at
> org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:106)
>     at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)
>     at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)
>     at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
>     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
>     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>     at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
>     at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
>     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
>     at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1340)
>     at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1376)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1349)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>

Re: Nutch 1.8 in pseudo dist error

Reply via email to