Hello Martin,

I am in the final testing of a similar tool. It will allow anyone to specify 
path and time range for deletion. I will let everyone know when it is 
available. 

I will also be taking a look at Apache Falcon.

Thank you,


Joe

-----Original Message-----
From: Flavio Pompermaier [mailto:[email protected]] 
Sent: Monday, July 13, 2015 4:19 PM
To: [email protected]
Subject: Re: HDFS cleanup after certain time

Have you ever looked at Apache Falcon?
On 13 Jul 2015 23:15, "Martin Chalupa" <[email protected]> wrote:

> Hello everyone,
>
> I think how to solve following problem. I have an oozie workflow which 
> produce some intermediate results and some final results on HDFS. I 
> would like to ensure that those files will be deleted after certain 
> time. I would like to achieve that just with oozie and hadoop 
> ecosystem. My workflow gets working directory as an input so I know 
> that all files will be created within this directory. My idea is that 
> I would create coordinator job in the first step in the workflow. This 
> coordinator will be configured to fire exactly once after configured 
> period. The coordinator will execute very simple oozie workflow which will 
> just remove given working directory.
>
> What do you think about this approach?
>
> I know that there is no support for creating coordinator within 
> workflow so I have to implement that probably as a java action. Also 
> it means that for each workflow there will be one coordinator. Is 
> there any limit for how much coordinators can be active?
>
> Thank you
> Martin

Reply via email to