Thanks for this tipp, I have now adapted my hadoop-site.xml to use a big disk
for temporary usage.
Regards
Andrzej Bialecki <[EMAIL PROTECTED]> wrote: ML mail wrote:
> Thanks for your answer! So I will move on and use the latest nightly build
> instead of the 0.9 stable version. Hopefully is nightly build stable enough
> to use in a production environment.
>
>
> Lyndon Maydwell wrote: From what I have read, this has been solved in recent
> revisions, so
> downloading a new build or checking out the latest source should solve
> the problem. I am still using a version that has this problem, but
> should be switching shortly. My solution in the mean time has been to
> delete the temporary files after crawling. This works for me, and I
> suspect it is due to the failure of Nutch to delete files.
>
In fact, I doubt this would solve your problem. The latest trunk doesn't
change in any significant way the temporary space usage, so if you ran
out of space before, you would do the same with the latest nightly build.
The solution is to configure Hadoop to use a different place than /tmp
for temporary files, a place where you have enough disk space to fit all
downloaded and temporary data. You can configure this by adding the
following to conf/hadoop-site.xml:
hadoop.tmp.dir
/my/large/disk/space/hadoop-${user.name}
(if you run Hadoop in non-local mode, you need to restart the cluster).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com