Hi,

We recently removed some cleanup code, because it involved checking some store 
meta data to check when we can delete a directory. For certain stores (like 
S3), requesting this meta data whenever we delete a file was so expensive that 
it could bring down the job because removing state could not be processed fast 
enough. We have a temporary fix in place now, so that jobs at large scale can 
still run reliably on stores like S3. Currently, this comes at the cost of not 
cleaning up directories but we are clearly planning to introduce a different 
mechanism for directory cleanup in the future that is not as fine grained as 
doing meta data queries per file delete. In the meantime, unfortunately the 
best way is to cleanup empty directories with some external tool.

Best,
Stefan

> Am 20.09.2017 um 01:23 schrieb Hao Sun <ha...@zendesk.com>:
> 
> Thanks Elias! Seems like there is no better answer than "do not care about 
> them now", or delete with a background job.
> 
> On Tue, Sep 19, 2017 at 4:11 PM Elias Levy <fearsome.lucid...@gmail.com 
> <mailto:fearsome.lucid...@gmail.com>> wrote:
> There are a couple of related JIRAs:
> 
> https://issues.apache.org/jira/browse/FLINK-7587 
> <https://issues.apache.org/jira/browse/FLINK-7587>
> https://issues.apache.org/jira/browse/FLINK-7266 
> <https://issues.apache.org/jira/browse/FLINK-7266>
> 
> 
> On Tue, Sep 19, 2017 at 12:20 PM, Hao Sun <ha...@zendesk.com 
> <mailto:ha...@zendesk.com>> wrote:
> Hi, I am using RocksDB and S3 as storage backend for my checkpoints.
> Can flink delete these empty directories automatically? Or I need a 
> background job to do the deletion?
> 
> I know this has been discussed before, but I could not get a concrete answer 
> for it yet. Thanks
> 
> <image.png>
> 

Reply via email to