Sent from TypeApp
On Feb 3, 2018, 10:48, at 10:48, Kien Truong <duckientru...@gmail.com> wrote:
>Speaking from my experience, if the distributed disk fail, the
>checkpoint will fail as well, but the job will continue running. The
>checkpoint scheduler will keep running, so the first scheduled
>checkpoint after you repair your disk should succeed.
>Of course, if you also write to the distributed disk inside your job,
>then your job may crash too, but this is unrelated to the checkpoint
>Sent from TypeApp
>On Feb 2, 2018, 23:30, at 23:30, Christophe Jolif <cjo...@gmail.com>
>>If I understand well RocksDB is using two disk, the Task Manager local
>>for "local storage" of the state and the distributed disk for
>>- if I have 3 TaskManager I should expect more or less (depending on
>>the tasks are balanced) to find a third of my overall state stored on
>>on each of this TaskManager node?
>>- if the local node/disk fails I will get the state back from the
>>distributed disk and things will start again and all is fine. However
>>happens if the distributed disk fails? Will Flink continue processing
>>waiting for me to mount a new distributed disk? Or will it stop? May I
>>data/reprocess things under that condition?