Re: JobManager HA without Distributed FileSystem

2016-08-24 Thread Stephan Ewen
Hi! - Concerning replication to other JobManagers - this could be an extension, but it would need to also support additional replacement JobManagers coming up later, so it would need a replication service in the JobManagers, not just a "send to all" at program startup. - That would work in th

Re: JobManager HA without Distributed FileSystem

2016-08-24 Thread Konstantin Knauf
Hi Stephan, thanks for the quick response, understood. Is there a reason why JAR files and JobGraph are not sent to all JobManagers by the client? Accordingly, why don't all taskmanagers sent Checkpoint Metadata to all JobManagers? I did not have any other storage at mind [1]. I am basically inte

Re: JobManager HA without Distributed FileSystem

2016-08-23 Thread Stephan Ewen
Hi! The state one can store in ZooKeeper is only very small (recommended is smaller than 1MB per handle). For HA, the JobManager needs to persist: - JobGraph - JAR files - Checkpoint Metadata Those are easily too large for ZooKeeper, which is why Flink currently requires a DFS to store tho

JobManager HA without Distributed FileSystem

2016-08-23 Thread Konstantin Knauf
Hi all, the documenation of JobManager HA [1] explains that HA is only possible with the FS state backend as Job Manager metadata is saved there. What are the particular problems using JobManager HA with the MemoryStatebackend? As I understand it, the state is checkpointed to all JobManagers (le