Thanks Aljoscha.
I haven't checked that bit. Is there any configuration for TaskManagers to find 
ZK?
Regards,
Chirag

Sent from Yahoo Mail on Android 
 
  On Wed, 14 Feb 2018 at 7:43 PM, Aljoscha Krettek<aljos...@apache.org> wrote:  
 Do you see in the logs whether the TaskManager correctly connect to ZooKeeper 
as well? They need this in order to find the JobManager leader.
Best,Aljoscha


On 14. Feb 2018, at 06:12, Chirag Dewan <chirag.dewa...@yahoo.in> wrote:
Hi,
I am trying to deploy a Flink cluster (1 JM, 2TM) on a Docker Swarm. For 
JobManager HA, I have started a 3 node zookeeper service on the same swarm 
network and configured Flink's zookeeper quorum with zookeeper service 
instances. 
JobManager gets started with the LeaderElectionService and gets assigned a 
LeaderSessionID too, which I can see from the following log 
statements(attaching only related logs) :
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  - 
Starting ZooKeeperLeaderElectionService   
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - 
Starting 
ZooKeeperLeaderRetrievalService.org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService
  - Starting ZooKeeperLeaderRetrievalService.JobManager 
akka.tcp://flink@jobmanager:6123/user/jobmanager was granted leadership with 
leader session ID Some(1f3b2ec6-77b6-4532-928f-ad8befd5202f).
 Trying to associate with JobManager leader 
akka.tcp://flink@jobmanager:6123/user/jobmanager Resource Manager associating 
with leading JobManager Actor[akka://flink/user/jobmanager#590681231] - leader 
session 1f3b2ec6-77b6-4532-928f-ad8befd5202f

But TaskManagers are not able to register with the JobManager and gives the 
following error:
Discard message 
LeaderSessionMessage(00000000-0000-0000-0000-000000000000,RegisterTaskManager(4fc8aceeae1e27e42b9f16df6c0cf5e3,4fc8aceeae1e27e42b9f16df6c0cf5e3
 @ a118cdf39114 (dataPort=43017),cores=1, physMem=1044111360, heap=536870912, 
managed=324208384,1)) because the expected leader session ID 
1f3b2ec6-77b6-4532-928f-ad8befd5202f did not equal the received leader session 
ID 00000000-0000-0000-0000-000000000000.

Seems like the ResourceManager was not able to retrieve the LeaderSessionID and 
passed 00 ID. 
One interesting thing I observed was a ZK version log:
The version of ZooKeeper being used doesn't support Container nodes. 
CreateMode.PERSISTENT will be used instead.

Is this a ZK version problem? Should I be using ZK 3.4.6?
My configuration:
Flink Version : 1.4.0ZK version : 3.4.11 (I just pulled the latest image)
Thanks in advance. 
Chirag


  

Reply via email to