Hello Martin, I am in the final testing of a similar tool. It will allow anyone to specify path and time range for deletion. I will let everyone know when it is available.
I will also be taking a look at Apache Falcon. Thank you, Joe -----Original Message----- From: Flavio Pompermaier [mailto:[email protected]] Sent: Monday, July 13, 2015 4:19 PM To: [email protected] Subject: Re: HDFS cleanup after certain time Have you ever looked at Apache Falcon? On 13 Jul 2015 23:15, "Martin Chalupa" <[email protected]> wrote: > Hello everyone, > > I think how to solve following problem. I have an oozie workflow which > produce some intermediate results and some final results on HDFS. I > would like to ensure that those files will be deleted after certain > time. I would like to achieve that just with oozie and hadoop > ecosystem. My workflow gets working directory as an input so I know > that all files will be created within this directory. My idea is that > I would create coordinator job in the first step in the workflow. This > coordinator will be configured to fire exactly once after configured > period. The coordinator will execute very simple oozie workflow which will > just remove given working directory. > > What do you think about this approach? > > I know that there is no support for creating coordinator within > workflow so I have to implement that probably as a java action. Also > it means that for each workflow there will be one coordinator. Is > there any limit for how much coordinators can be active? > > Thank you > Martin
