Hi I have been raising this point for quite a time
Right now when we have a new job, we store the job.jar and job.xml files in the job tracker. The task tracker if i am right uses this job.jar and job.xml files Should'nt we clean up after the job has been complete( that is purge these files). The purging of these files can be done immediately when the job gets complete I find that the these files consume a lot of disk space and have to be deleted. Has anyone else noticed the same problem. At the end of a single crawl i find that this has taken up atleast 30 mb of disk space. There are two alternatives 1) delete the temporary files with a shell script 2) clean it up as and then (when the job completes have a clean up in code) The problem with option 1 is there may be instances of nutch running in which case the shell script should not attempt to delete files as another instance is running. So it is better to stick with option 2 Rgds Prabhu
