The filecrush tool has a small utility called Clean that accepts and
age argument and deletes all the files in a directory older then a
certain time.

We use clean to clean up the tmp hdfs directories applications leave
remnants in.

Edward

On 6/1/12, Vinod Singh <vi...@vinodsingh.com> wrote:
> Yes, that is how I do. Though 1 month is too long, I keep it just 2 days.
>
> Thanks,
> Vinod
>
> http://blog.vinodsingh.com/
>
> On Fri, Jun 1, 2012 at 2:15 PM, Ruben de Vries
> <ruben.devr...@hyves.nl>wrote:
>
>> So I should write a job which cleans up 1 month old results or something
>> like that?
>>
>> From: Vinod Singh [mailto:vi...@vinodsingh.com]
>> Sent: Friday, June 01, 2012 10:35 AM
>> To: user@hive.apache.org
>> Subject: Re: Hive scratch dir not cleaning up
>>
>> Hive deletes job contents from the scratch directory on completion of the
>> job. Though failed / killed jobs leave data there, which needs to be
>> removed manually.
>>
>> Thanks,
>> Vinod
>>
>> http://blog.vinodsingh.com/
>> On Fri, Jun 1, 2012 at 1:58 PM, Ruben de Vries <ruben.devr...@hyves.nl>
>> wrote:
>> Hey Hivers,
>>
>> I’m almost ready to replace our old hadoop implementation with a
>> implementation using Hive,
>>
>> Now I’ve ran into (hopefully) my last problem; my /tmp/hive-hduser dir is
>> getting kinda big!
>> It doesn’t seem to cleanup this tmp files, googling for it I run into
>> some
>> tickets about a cleanup setting, should I enable this with the below
>> setting?
>> Why doesn’t it do that by default? Am I the only one somehow racking up a
>> lot of space with tmp files?
>>
>>
>>
>>
>> <property>
>>   <name>hive.start.cleanup.scratchdir</name>
>>   <value>true</value>
>> </property>
>>
>>
>

Reply via email to