The filecrush tool has a small utility called Clean that accepts and age argument and deletes all the files in a directory older then a certain time.
We use clean to clean up the tmp hdfs directories applications leave remnants in. Edward On 6/1/12, Vinod Singh <vi...@vinodsingh.com> wrote: > Yes, that is how I do. Though 1 month is too long, I keep it just 2 days. > > Thanks, > Vinod > > http://blog.vinodsingh.com/ > > On Fri, Jun 1, 2012 at 2:15 PM, Ruben de Vries > <ruben.devr...@hyves.nl>wrote: > >> So I should write a job which cleans up 1 month old results or something >> like that? >> >> From: Vinod Singh [mailto:vi...@vinodsingh.com] >> Sent: Friday, June 01, 2012 10:35 AM >> To: user@hive.apache.org >> Subject: Re: Hive scratch dir not cleaning up >> >> Hive deletes job contents from the scratch directory on completion of the >> job. Though failed / killed jobs leave data there, which needs to be >> removed manually. >> >> Thanks, >> Vinod >> >> http://blog.vinodsingh.com/ >> On Fri, Jun 1, 2012 at 1:58 PM, Ruben de Vries <ruben.devr...@hyves.nl> >> wrote: >> Hey Hivers, >> >> I’m almost ready to replace our old hadoop implementation with a >> implementation using Hive, >> >> Now I’ve ran into (hopefully) my last problem; my /tmp/hive-hduser dir is >> getting kinda big! >> It doesn’t seem to cleanup this tmp files, googling for it I run into >> some >> tickets about a cleanup setting, should I enable this with the below >> setting? >> Why doesn’t it do that by default? Am I the only one somehow racking up a >> lot of space with tmp files? >> >> >> >> >> <property> >> <name>hive.start.cleanup.scratchdir</name> >> <value>true</value> >> </property> >> >> >