ML mail wrote:
Thanks for your answer! So I will move on and use the latest nightly build 
instead of the 0.9 stable version. Hopefully is nightly build stable enough to 
use in a production environment.


Lyndon Maydwell <[EMAIL PROTECTED]> wrote: From what I have read, this has been 
solved in recent revisions, so
downloading a new build or checking out the latest source should solve
the problem. I am still using a version that has this problem, but
should be switching shortly. My solution in the mean time has been to
delete the temporary files after crawling. This works for me, and I
suspect it is due to the failure of Nutch to delete files.


In fact, I doubt this would solve your problem. The latest trunk doesn't change in any significant way the temporary space usage, so if you ran out of space before, you would do the same with the latest nightly build.

The solution is to configure Hadoop to use a different place than /tmp for temporary files, a place where you have enough disk space to fit all downloaded and temporary data. You can configure this by adding the following to conf/hadoop-site.xml:

<property>
        <name>hadoop.tmp.dir</name>
        <value>/my/large/disk/space/hadoop-${user.name}</value>
</property>

(if you run Hadoop in non-local mode, you need to restart the cluster).

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to