Re: RocksDB / checkpoint questions

Kien Truong Fri, 02 Feb 2018 22:26:45 -0800


⁣Sent from TypeApp


On Feb 3, 2018, 10:48, at 10:48, Kien Truong <duckientru...@gmail.com> wrote:
>Hi,
>Speaking from my experience, if the distributed disk fail, the
>checkpoint will fail as well, but the job will continue running. The
>checkpoint scheduler will keep running, so the first scheduled
>checkpoint after you repair your disk should succeed.
>
>Of course, if you also write to the distributed disk inside your job,
>then your job may crash too, but this is unrelated to the checkpoint
>process.
>
>Best regards,
>Kien
>
>⁣Sent from TypeApp 
>
>On Feb 2, 2018, 23:30, at 23:30, Christophe Jolif <cjo...@gmail.com>
>wrote:
>>If I understand well RocksDB is using two disk, the Task Manager local
>>disk
>>for "local storage" of the state and the distributed disk for
>>checkpointing.
>>
>>Two questions:
>>
>>- if I have 3 TaskManager I should expect more or less (depending on
>>how
>>the tasks are balanced) to find a third of my overall state stored on
>>disk
>>on each of this TaskManager node?
>>
>>- if the local node/disk fails I will get the state back from the
>>distributed disk and things will start again and all is fine. However
>>what
>>happens if the distributed disk fails? Will Flink continue processing
>>waiting for me to mount a new distributed disk? Or will it stop? May I
>>lose
>>data/reprocess things under that condition?
>>
>>--
>>Christophe Jolif

Re: RocksDB / checkpoint questions

Reply via email to