I definetly agree with you, I have hundreds of Mb in my /tmp/hadoop dir. I think we should open an issue in jira; if you want I could open one.
Enrico On 4/3/06, Raghavendra Prabhu <[EMAIL PROTECTED]> wrote: > Hi > > I have been raising this point for quite a time > > Right now when we have a new job, we store the job.jar and job.xml files in > the job tracker. The task tracker if i am right uses this job.jar and > job.xml files > > Should'nt we clean up after the job has been complete( that is purge these > files). The purging of these files can be done immediately when the job gets > complete > > I find that the these files consume a lot of disk space and have to be > deleted. > > Has anyone else noticed the same problem. At the end of a single crawl i > find that this has taken up atleast 30 mb of disk space. > > There are two alternatives > 1) delete the temporary files with a shell script > 2) clean it up as and then (when the job completes have a clean up in code) > > The problem with option 1 is there may be instances of nutch running in > which case the shell script should not attempt to delete files as another > instance is running. > So it is better to stick with option 2 > > Rgds > Prabhu > > ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
