I would like to ask the same question. My crawls running last night failed because the disk space had been exhausted with temp files. Why doesn't nutch clean these up when it is done with them? Is there some setting somewhere to remove them that I'm not seeing? If not, is there some way for me to determine which of the temp files are associated with a given crawl? Simply deleting the whole thing when a crawl is done is not an option, as I run multiple crawls concurrently. I can't just blow the temp files away and have my other crawls fail.
Ann ----- Original Message ---- From: Lyndon Maydwell <[EMAIL PROTECTED]> To: [email protected] Sent: Monday, September 17, 2007 9:14:49 AM Subject: Re: free disk space Thanks Doğacan, I went ahead and did this anyway after chcking that they weren't being used, and all was well, but does nutch usually take up this much space in temp files? I'm running the crawl on a server that never gets restarted, so I can't have all the drive space used up like this. I can write a cron job to regularly remove the files, but this seems a bit haphazard. Thanks again for your reply, Lyndon. ____________________________________________________________________________________ Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz
