same as for Nutch 2.2.1 in pseudo

bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr/ 10

from within the deploy dir.

However, i remember reading somewhere that the deploy execution for the 1.x
series is different than the 2.x series, that some more files, asides the
seed.txt had to be copied over to HDFS, but I can't find what I read


On Sun, May 4, 2014 at 12:06 AM, Sebastian Nagel <[email protected]
> wrote:

> Hi,
>
> looks like the segment is not "addressed" properly:
>
> hdfs://localhost:54310/user/hduser/TestCrawl/segments/crawl_generate
>
> Segments are named by a time-stamp, e.g.
>    .../TestCrawl/segments/20140502231126/
> "crawl_generate" is a subdir.
>
> Can you specify the exact commands to run the crawler?
>
> Sebastian
>
> On 05/03/2014 08:30 PM, BlackIce wrote:
> > Hi,
> >
> > what needs to be copyied over to the HDFS in Nutch 1.8? or what is the
> > command? when trying to run the crawl script under /runtime/deploy I get
> > the following:
> >
> > 14/05/03 14:59:03 INFO fetcher.Fetcher: Fetcher: starting at 2014-05-03
> > 14:59:03
> > 14/05/03 14:59:03 INFO fetcher.Fetcher: Fetcher: segment:
> TestCrawl/segments
> > 14/05/03 14:59:03 INFO fetcher.Fetcher: Fetcher Timelimit set for :
> > 1399132743190
> > 14/05/03 14:59:03 INFO mapred.JobClient: Cleaning up the staging area
> >
> hdfs://localhost:54310/home/hduser/tmp/mapred/staging/hduser/.staging/job_201405031455_0005
> > 14/05/03 14:59:03 ERROR security.UserGroupInformation:
> > PriviledgedActionException as:hduser
> > cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not
> > exist:
> hdfs://localhost:54310/user/hduser/TestCrawl/segments/crawl_generate
> > 14/05/03 14:59:03 ERROR fetcher.Fetcher: Fetcher:
> > org.apache.hadoop.mapred.InvalidInputException: Input path does not
> exist:
> > hdfs://localhost:54310/user/hduser/TestCrawl/segments/crawl_generate
> >     at
> >
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
> >     at
> >
> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:40)
> >     at
> > org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:106)
> >     at
> > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)
> >     at
> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)
> >     at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
> >     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
> >     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:415)
> >     at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
> >     at
> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
> >     at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
> >     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
> >     at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1340)
> >     at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1376)
> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >     at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1349)
> >     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >     at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >     at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >     at java.lang.reflect.Method.invoke(Method.java:606)
> >     at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> >
>
>

Reply via email to