you might want to check to see if > Injector: urlDir: di/urls
still exist in your hdfs. On 06/24/2014 12:30 AM, John Lafitte wrote:
Using Nutch 1.7 Out of the blue all of my crawl jobs started failing a few days ago. I checked the user logs and nobody logged into the server and there were no reboots or any other obvious issues. There is plenty of disk space. Here is the error I'm getting, any help is appreciated: Injector: starting at 2014-06-24 07:26:54 Injector: crawlDb: di/crawl/crawldb Injector: urlDir: di/urls Injector: Converting injected urls to crawl db entries. Injector: ENOENT: No such file or directory at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:701) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:656) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) at org.apache.nutch.crawl.Injector.inject(Injector.java:281) at org.apache.nutch.crawl.Injector.run(Injector.java:318) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Injector.main(Injector.java:308)
-- Kaveh Minooie