Re: Error while trying to run nutch

Sebastian Nagel Mon, 08 Jul 2013 12:52:36 -0700

Hi Vincent,

that's an issue of newer Hadoop version (cf.
https://issues.apache.org/jira/browse/HADOOP-7682).


You are not alone:
http://lucene.472066.n3.nabble.com/Using-nutch-1-6-in-Windows-7-td4028935.html
http://florianhartl.com/nutch-installation.html
http://stackoverflow.com/questions/11164940/crawling-using-nutch-shows-an-ioexception

I have no Windows at hand to try any solution, but you could
try to downgrade Hadoop by replacing
 $NUTCH_HOME/lib/hadoop-core-1.2.0.jar
with a 0.20.x version. There are no changes to interfaces,
so it should be possible without downgrading Nutch.

Sebastian

On 07/08/2013 06:58 AM, Anup Kuri, Vincent wrote:
> Hi all,
> 
> So I'm trying to get nutch working on Windows 7. Downloaded the latest 
> version of nutch, 1.7. Downloaded the binary release and extracted it. I set 
> up JAVA_HOME correctly.
> When I run the crawl command, I get the following,
> 
> $ bin/nutch crawl urls -dir crawl -depth 1
> C:\Java
> cygpath: can't convert empty path
> solrUrl is not set, indexing will be skipped...
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 1
> solrUrl=null
> Injector: starting at 2013-07-08 10:12:54
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Exception in thread "main" java.io.IOException: Failed to set permissions of 
> path: \tmp\hadoop-vanupkuri\mapred\staging\vanupkuri1659857559\.staging to 
> 0700
>         at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691)
>         at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
>         at 
> org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
>         at 
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>         at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
>        at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
>         at org.apache.nutch.crawl.Crawl.run(Crawl.java:132)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
> 
> I don't want to downgrade to lower versions of nutch the way others 
> recommend. Can someone help me fix this issue?
> 
> Regards,
> Vincent Anup Kuri
> 
>

Re: Error while trying to run nutch

Reply via email to