Hi Sebastian, Your suggestion actually helped.
Thank you very much. Slavik. On Tue, Mar 10, 2015 at 9:39 PM, Sebastian Nagel <[email protected] > wrote: > Hi Slavik, > > assumed that > /user/ubuntu/urls/ > contains seed URLs it should not contain also the CrawlDb. > The path in the error message > /user/ubuntu/urls/crawldb > suggests that Injector tries to read URLs from crawldb > which is (a) a directory and (b) contains binary data. > > Sebastian > > > On 03/10/2015 04:09 PM, Svyatoslav Lavryk wrote: > > Hello, > > > > We are using Nutch 1.9 and Hadoop 1.2.1 > > > > When we submit URLs for initial crawl, everything works fine. > > > > When we submit some other urls next time to be added to an already > existing > > crawl db, we receive error: > > > > 15/03/10 13:09:20 ERROR security.UserGroupInformation: > > PriviledgedActionException as:ubuntu cause:java.io.IOException: Not a > file: > > hdfs://master:9000/user/ubuntu/urls/crawldb > > 15/03/10 13:09:20 ERROR crawl.Injector: Injector: java.io.IOException: > Not > > a file: hdfs://master:9000/user/ubuntu/urls/crawldb > > > > Stack trace: > > at > > > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:215) > > at > > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081) > > at > > org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073) > > at > org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) > > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) > > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > > at > > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) > > at > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) > > at org.apache.nutch.crawl.Injector.inject(Injector.java:281) > > at org.apache.nutch.crawl.Crawl.run(Crawl.java:132) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:160) > > > > > > We think we have enabled permissions on the folder: > > /usr/local/hadoop/bin/hadoop dfs -chmod -R 777 /user/ubuntu/urls/ > > > > Any ideas will be appreciated. > > > > Thanks, > > Slavik > > > >

