Hi,

We are facing an issue in standalone HA mode in Flink 1.4.0 where
Taskmanager restarts and is not able to register with the Jobmanager. It
times out awaiting *AcknowledgeRegistration/AlreadyRegistered* message from
Jobmanager Actor and keeps sending *RegisterTaskManager *message. The logs
at Jobmanager don’t show anything about registration failure/request. It
doesn’t print *log*.debug(*s"RegisterTaskManager: $*msg*"*) (from
JobManager.scala) either. The network connection between taskmanager and
jobmanager seems fine; tcpdump shows message sent to jobmanager and TCP ACK
received from jobmanager. Note that the communication is happening between
docker containers.


Following are the logs from Taskmanager:



{"timeMillis":1539189572438,"thread":"flink-akka.actor.default-dispatcher-2","level":"INFO","loggerName":"org.apache.flink.runtime.taskmanager.TaskManager","message":"Trying
to register at JobManager akka.tcp://
flink@192.168.83.51:6123/user/jobmanager (attempt 1400, timeout: 30000
milliseconds)","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","threadId":48,"threadPriority":5}

{"timeMillis":1539189580229,"thread":"Curator-Framework-0-SendThread(zookeeper.maglev-system.svc.cluster.local:2181)","level":"DEBUG","loggerName":"org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn","message":"Got
ping response for sessionid: 0x10000260ea5002d after
0ms","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","threadId":101,"threadPriority":5}

{"timeMillis":1539189600247,"thread":"Curator-Framework-0-SendThread(zookeeper.maglev-system.svc.cluster.local:2181)","level":"DEBUG","loggerName":"org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn","message":"Got
ping response for sessionid: 0x10000260ea5002d after
0ms","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","threadId":101,"threadPriority":5}

{"timeMillis":1539189602458,"thread":"flink-akka.actor.default-dispatcher-2","level":"INFO","loggerName":"org.apache.flink.runtime.taskmanager.TaskManager","message":"Trying
to register at JobManager akka.tcp://
flink@192.168.83.51:6123/user/jobmanager (attempt 1401, timeout: 30000
milliseconds)","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","threadId":48,"threadPriority":5}

{"timeMillis":1539189620251,"thread":"Curator-Framework-0-SendThread(zookeeper.maglev-system.svc.cluster.local:2181)","level":"DEBUG","loggerName":"org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn","message":"Got
ping response for sessionid: 0x10000260ea5002d after
0ms","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","threadId":101,"threadPriority":5}

{"timeMillis":1539189632478,"thread":"flink-akka.actor.default-dispatcher-2","level":"INFO","loggerName":"org.apache.flink.runtime.taskmanager.TaskManager","message":"Trying
to register at JobManager akka.tcp://
flink@192.168.83.51:6123/user/jobmanager (attempt 1402, timeout: 30000
milliseconds)","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","threadId":48,"threadPriority":5}

Reply via email to