I am running a standalone flink cluster (1.1.2) and I have a stateful streaming job that uses RocksDB as a state manager. I have two stateful operators that are using ValueState<> and ListState<>. Every now and then my job fails with the following exception
java.lang.Exception: Could not restore checkpointed state to operators and functions at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:552) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:250) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.FileNotFoundException: File file:/data/flink/checkpoints/226c84df02e47d1b9c036ba894503145/StreamMap_12_5/dummy_state/chk-83 does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) at org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:88) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1975) at org.apache.flink.streaming.util.HDFSCopyToLocal$1.run(HDFSCopyToLocal.java:48) Can someone help me with this, Is this a known issue ? Thanks Abiy Hailemichael Software Engineer Email: abiybirtu...@gmail.com <hailemichaela...@gmail.com>