No need to open an issue... It seems that the svn version already addressed this problem! Thanks nutch developes!
Cheers, Enrico On 4/10/06, Enrico Triolo <[EMAIL PROTECTED]> wrote: > I definetly agree with you, I have hundreds of Mb in my /tmp/hadoop dir. > I think we should open an issue in jira; if you want I could open one. > > Enrico > > On 4/3/06, Raghavendra Prabhu <[EMAIL PROTECTED]> wrote: > > Hi > > > > I have been raising this point for quite a time > > > > Right now when we have a new job, we store the job.jar and job.xml files in > > the job tracker. The task tracker if i am right uses this job.jar and > > job.xml files > > > > Should'nt we clean up after the job has been complete( that is purge these > > files). The purging of these files can be done immediately when the job gets > > complete > > > > I find that the these files consume a lot of disk space and have to be > > deleted. > > > > Has anyone else noticed the same problem. At the end of a single crawl i > > find that this has taken up atleast 30 mb of disk space. > > > > There are two alternatives > > 1) delete the temporary files with a shell script > > 2) clean it up as and then (when the job completes have a clean up in code) > > > > The problem with option 1 is there may be instances of nutch running in > > which case the shell script should not attempt to delete files as another > > instance is running. > > So it is better to stick with option 2 > > > > Rgds > > Prabhu > > > > > ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
