Hi

I have been raising this point for quite a time

Right now when we have a new job, we store the job.jar and job.xml files in
the job tracker. The task tracker if i am right uses this job.jar and
job.xml files

Should'nt we clean up after the job has been complete( that is purge these
files). The purging of these files can be done immediately when the job gets
complete

I find that the these files consume a lot of disk space and have to be
deleted.

Has anyone else noticed the same problem. At the end of a single crawl i
find that this has taken up atleast 30 mb of disk space.

There are two alternatives
1) delete the temporary files with a shell script
2) clean it up as and then (when the job completes have a clean up in code)

The problem with option 1 is there may be instances of nutch running in
which case the shell script should not attempt to delete files as another
instance is running.
So it is better to stick with option 2

Rgds
Prabhu

Reply via email to