Thanks Stephan

We can confirm that turning off RocksDB incremental checkpointing seems to
help and greatly reduces the number of files (from tens of thousands to low
thousands).

We still see that there is a inflection point when running > 50 jobs causes
the appmaster to stop deleting files from S3 and leads to a unbounded growth
(slower without incremental checkpointing) of the S3 recovery directory.

We would be willing to try a patched version (already have a fork)... just
to confirm you are suggest to delete the line "fs.delete(filePath, false);"
from discardState()?

```
        @Override
        public void discardState() throws Exception {

                FileSystem fs = getFileSystem();

                fs.delete(filePath, false);

                try {
                        FileUtils.deletePathIfEmpty(fs, filePath.getParent());
                } catch (Exception ignored) {}
        }
```





--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/S3-recovery-and-checkpoint-directories-exhibit-explosive-growth-tp14270p14452.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Reply via email to