Hi Slavik, assumed that /user/ubuntu/urls/ contains seed URLs it should not contain also the CrawlDb. The path in the error message /user/ubuntu/urls/crawldb suggests that Injector tries to read URLs from crawldb which is (a) a directory and (b) contains binary data.
Sebastian On 03/10/2015 04:09 PM, Svyatoslav Lavryk wrote: > Hello, > > We are using Nutch 1.9 and Hadoop 1.2.1 > > When we submit URLs for initial crawl, everything works fine. > > When we submit some other urls next time to be added to an already existing > crawl db, we receive error: > > 15/03/10 13:09:20 ERROR security.UserGroupInformation: > PriviledgedActionException as:ubuntu cause:java.io.IOException: Not a file: > hdfs://master:9000/user/ubuntu/urls/crawldb > 15/03/10 13:09:20 ERROR crawl.Injector: Injector: java.io.IOException: Not > a file: hdfs://master:9000/user/ubuntu/urls/crawldb > > Stack trace: > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:215) > at > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081) > at > org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073) > at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) > at org.apache.nutch.crawl.Injector.inject(Injector.java:281) > at org.apache.nutch.crawl.Crawl.run(Crawl.java:132) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:160) > > > We think we have enabled permissions on the folder: > /usr/local/hadoop/bin/hadoop dfs -chmod -R 777 /user/ubuntu/urls/ > > Any ideas will be appreciated. > > Thanks, > Slavik >

