[ 
http://issues.apache.org/jira/browse/NUTCH-159?page=comments#action_12361541 ] 

Doug Cutting commented on NUTCH-159:
------------------------------------

mapred.local.dir is the thing to set.  if that fails, then there is a bug.  
what did you have this set to?

> Specify temp/working directory for crawl
> ----------------------------------------
>
>          Key: NUTCH-159
>          URL: http://issues.apache.org/jira/browse/NUTCH-159
>      Project: Nutch
>         Type: Bug
>   Components: fetcher, indexer
>     Versions: 0.8-dev
>  Environment: Linux/Debian
>     Reporter: byron miller

>
> I ran a crawl of 100k web pages and got:
> org.apache.nutch.fs.FSError: java.io.IOException: No space left on device
>         at 
> org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(LocalFileSystem.java:149)
>         at org.apache.nutch.fs.FileUtil.copyContents(FileUtil.java:65)
>         at 
> org.apache.nutch.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:178)
>         at 
> org.apache.nutch.fs.NutchFileSystem.rename(NutchFileSystem.java:224)
>         at 
> org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:80)
> Caused by: java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method)
>         at java.io.FileOutputStream.write(FileOutputStream.java:260)
>         at 
> org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(LocalFileSystem.java:147)
>         ... 4 more
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
>         at org.apache.nutch.crawl.Fetcher.fetch(Fetcher.java:335)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:107)
> [EMAIL PROTECTED]:/data/nutch$ df -k
> It appears crawl created a /tmp/nutch directory that filled up even though i 
> specified a db directory.
> Need to add a parameter to the command line or make a globaly configurable 
> /tmp (work area) for the nutch instance so that crawls won't fail.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to