[jira] [Commented] (FLINK-2289) Make JobManager highly available

2015-06-29 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605534#comment-14605534
 ] 

Till Rohrmann commented on FLINK-2289:
--

Sorry, I didn't see your issue. Will do.

 Make JobManager highly available
 

 Key: FLINK-2289
 URL: https://issues.apache.org/jira/browse/FLINK-2289
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 Currently, the {{JobManager}} is the single point of failure in the Flink 
 system. If it fails, then your job cannot be recovered and the Flink cluster 
 is no longer able to receive new jobs.
 Therefore, it is crucial to make the {{JobManager}} fault tolerant so that 
 the Flink cluster can recover from failed {{JobManager}}. As a first step 
 towards this goal, I propose to make the {{JobManager}} highly available by 
 starting multiple instances and using Apache ZooKeeper to elect a leader. The 
 leader is responsible for the execution of the Flink job. 
 In case that the {{JobManager}} dies, one of the other running {{JobManager}} 
 will be elected as the leader and take over the role of the leader. The 
 {{Client}} and the {{TaskManager}} will automatically detect the new 
 {{JobManager}} by querying the ZooKeeper cluster.
 Note that this does not achieve full fault tolerance for the {{JobManager}} 
 but it allows the cluster to recover from failed {{JobManager}}. The design 
 of high-availability for the {{JobManager}} is tracked in the wiki here [1].
 Resources:
 [1] 
 [https://cwiki.apache.org/confluence/display/FLINK/JobManager+High+Availability]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2289) Make JobManager highly available

2015-06-29 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605308#comment-14605308
 ] 

Ufuk Celebi commented on FLINK-2289:


Issue creation race... ;) Can you copy your text to FLINK-2287?

 Make JobManager highly available
 

 Key: FLINK-2289
 URL: https://issues.apache.org/jira/browse/FLINK-2289
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann

 Currently, the {{JobManager}} is the single point of failure in the Flink 
 system. If it fails, then your job cannot be recovered and the Flink cluster 
 is no longer able to receive new jobs.
 Therefore, it is crucial to make the {{JobManager}} fault tolerant so that 
 the Flink cluster can recover from failed {{JobManager}}. As a first step 
 towards this goal, I propose to make the {{JobManager}} highly available by 
 starting multiple instances and using Apache ZooKeeper to elect a leader. The 
 leader is responsible for the execution of the Flink job. 
 In case that the {{JobManager}} dies, one of the other running {{JobManager}} 
 will be elected as the leader and take over the role of the leader. The 
 {{Client}} and the {{TaskManager}} will automatically detect the new 
 {{JobManager}} by querying the ZooKeeper cluster.
 Note that this does not achieve full fault tolerance for the {{JobManager}} 
 but it allows the cluster to recover from failed {{JobManager}}. The design 
 of high-availability for the {{JobManager}} is tracked in the wiki here [1].
 Resources:
 [1] 
 [https://cwiki.apache.org/confluence/display/FLINK/JobManager+High+Availability]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)