Markus, Thanks, the issue I was setting the PATH variable in the bin/crawl script and once I removed it and set it outside of the bin/crawl script , it started working fine now.
On Tue, Sep 16, 2014 at 6:39 AM, Markus Jelsma <[email protected]> wrote: > Hi - you made Nutch believe that > hdfs://server1.mydomain.com:9000/user/df/crawldirectory/segments/ is a > segment, but it is not. So either no segment was created or written to the > wrong location. > > I don't know what kind of script you are using but you should check the > return > code of the generator, if gives a -1 for no segment created. > > Markus > > > > > > > -----Original message----- > > From:Meraj A. Khan <[email protected] <mailto:[email protected]> > > > Sent: Monday 15th September 2014 7:02 > > To: [email protected] <mailto:[email protected]> > > Subject: Fetch Job Started Failing on Hadoop Cluster > > > > Hello Folks, > > > > My Nutch crawl which was running fine , started failing in the first > Fetch > > Job/Application, I am unable to figure out whats going on here, I have > > attached the last snippet of the log below , can some please let me know > > whats going on here ? > > > > What I noticed is that even though the generate phase created a > > segment 20140915004940 > > , the fetch phase is only looking up to the segments directory for the > > segments. > > > > Thanks. > > > > 14/09/15 00:50:07 INFO crawl.Generator: Generator: finished at 2014-09-15 > > 00:50:07, elapsed: 00:00:59 > > ls: cannot access crawldirectory/segments/: No such file or directory > > Operating on segment : > > Fetching : > > 14/09/15 00:50:09 INFO fetcher.Fetcher: Fetcher: starting at 2014-09-15 > > 00:50:09 > > 14/09/15 00:50:09 INFO fetcher.Fetcher: Fetcher: segment: > > crawldirectory/segments > > 14/09/15 00:50:09 INFO fetcher.Fetcher: Fetcher Timelimit set for : > > 1410767409664 > > Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library > > /opt/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0 which might have disabled > > stack guard. The VM will try to fix the stack guard now. > > It's highly recommended that you fix the library with 'execstack -c > > <libfile>', or link it with '-z noexecstack'. > > 14/09/15 00:50:10 WARN util.NativeCodeLoader: Unable to load > native-hadoop > > library for your platform... using builtin-java classes where applicable > > 14/09/15 00:50:10 INFO client.RMProxy: Connecting to ResourceManager at > > server1.mydomain.com/170.75.152.162:8040 > > 14/09/15 00:50:10 INFO client.RMProxy: Connecting to ResourceManager at > > server1.mydomain.com/170.75.152.162:8040 > > 14/09/15 00:50:12 INFO mapreduce.JobSubmitter: Cleaning up the staging > area > > /tmp/hadoop-yarn/staging/df/.staging/job_1410742329411_0010 > > 14/09/15 00:50:12 WARN security.UserGroupInformation: > > PriviledgedActionException as:df (auth:SIMPLE) > > cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not > > exist: hdfs:// > > server1.mydomain.com:9000/user/df/crawldirectory/segments/crawl_generate > > 14/09/15 00:50:12 WARN security.UserGroupInformation: > > PriviledgedActionException as:df (auth:SIMPLE) > > cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not > > exist: hdfs:// > > server1.mydomain.com:9000/user/df/crawldirectory/segments/crawl_generate > > 14/09/15 00:50:12 ERROR fetcher.Fetcher: Fetcher: > > org.apache.hadoop.mapred.InvalidInputException: Input path does not > exist: > > hdfs:// > > server1.mydomain.com:9000/user/df/crawldirectory/segments/crawl_generate > > at > > > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251) > > at > > > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45) > > at > > org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:108) > > at > > > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520) > > at > > > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512) > > at > > > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) > > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) > > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) > > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) > > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > > at > > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) > > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833) > > at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1349) > > at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1385) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > > at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1358) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > > > >

