Forgot the link. github.com/edwardcapriolo/filecrush
On 6/1/12, Edward Capriolo <edlinuxg...@gmail.com> wrote: > The filecrush tool has a small utility called Clean that accepts and > age argument and deletes all the files in a directory older then a > certain time. > > We use clean to clean up the tmp hdfs directories applications leave > remnants in. > > Edward > > On 6/1/12, Vinod Singh <vi...@vinodsingh.com> wrote: >> Yes, that is how I do. Though 1 month is too long, I keep it just 2 days. >> >> Thanks, >> Vinod >> >> http://blog.vinodsingh.com/ >> >> On Fri, Jun 1, 2012 at 2:15 PM, Ruben de Vries >> <ruben.devr...@hyves.nl>wrote: >> >>> So I should write a job which cleans up 1 month old results or something >>> like that? >>> >>> From: Vinod Singh [mailto:vi...@vinodsingh.com] >>> Sent: Friday, June 01, 2012 10:35 AM >>> To: user@hive.apache.org >>> Subject: Re: Hive scratch dir not cleaning up >>> >>> Hive deletes job contents from the scratch directory on completion of >>> the >>> job. Though failed / killed jobs leave data there, which needs to be >>> removed manually. >>> >>> Thanks, >>> Vinod >>> >>> http://blog.vinodsingh.com/ >>> On Fri, Jun 1, 2012 at 1:58 PM, Ruben de Vries <ruben.devr...@hyves.nl> >>> wrote: >>> Hey Hivers, >>> >>> I’m almost ready to replace our old hadoop implementation with a >>> implementation using Hive, >>> >>> Now I’ve ran into (hopefully) my last problem; my /tmp/hive-hduser dir >>> is >>> getting kinda big! >>> It doesn’t seem to cleanup this tmp files, googling for it I run into >>> some >>> tickets about a cleanup setting, should I enable this with the below >>> setting? >>> Why doesn’t it do that by default? Am I the only one somehow racking up >>> a >>> lot of space with tmp files? >>> >>> >>> >>> >>> <property> >>> <name>hive.start.cleanup.scratchdir</name> >>> <value>true</value> >>> </property> >>> >>> >> >