I definetly agree with you, I have hundreds of Mb in my /tmp/hadoop dir.
I think we should open an issue in jira; if you want I could open one.

Enrico

On 4/3/06, Raghavendra Prabhu <[EMAIL PROTECTED]> wrote:
> Hi
>
> I have been raising this point for quite a time
>
> Right now when we have a new job, we store the job.jar and job.xml files in
> the job tracker. The task tracker if i am right uses this job.jar and
> job.xml files
>
> Should'nt we clean up after the job has been complete( that is purge these
> files). The purging of these files can be done immediately when the job gets
> complete
>
> I find that the these files consume a lot of disk space and have to be
> deleted.
>
> Has anyone else noticed the same problem. At the end of a single crawl i
> find that this has taken up atleast 30 mb of disk space.
>
> There are two alternatives
> 1) delete the temporary files with a shell script
> 2) clean it up as and then (when the job completes have a clean up in code)
>
> The problem with option 1 is there may be instances of nutch running in
> which case the shell script should not attempt to delete files as another
> instance is running.
> So it is better to stick with option 2
>
> Rgds
> Prabhu
>
>


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to