Hi, I am running Flink 1.3.2 on kubernetes, I am not sure why sometime one of my TM is killed, is there a way to debug this? Thanks
===== Logs ==== *2017-10-05 22:36:42,631 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at fps-flink-taskmanager-2384273947-9n4kc (akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274/user/taskmanager) as 330ff7eeaabfe2b7289fee4a0e36c4b2. Current number of registered hosts is 2. Current number of alive task slots is 2.* 2017-10-05 22:37:04,974 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Source: KafkaSource(maxwell.users) -> MaxwellFilter->Maxwell(maxwell.users) -> FixedDelayWatermark(maxwell.users) -> MaxwellFPSEvent->InfluxDBData(maxwell.users) -> (Sink: influxdbSink(maxwell.users), Sink: PrintSink(maxwell.users)) (1/1) (attempt #0) to fps-flink-taskmanager-2384273947-9n4kc *2017-10-06 06:08:55,657 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed, address is now gated for [5000] ms. Reason: [Disassociated]* 2017-10-06 06:08:55,832 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.] 2017-10-06 06:09:01,232 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by: [fps-flink-taskmanager-2384273947-9n4kc: Name does not resolve] 2017-10-06 06:09:03,416 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by: [fps-flink-taskmanager-2384273947-9n4kc] 2017-10-06 06:09:11,174 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by: [fps-flink-taskmanager-2384273947-9n4kc] 2017-10-06 06:09:11,440 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.] 2017-10-06 06:09:21,232 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by: [fps-flink-taskmanager-2384273947-9n4kc: Name does not resolve] 2017-10-06 06:09:27,460 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.] 2017-10-06 06:09:31,173 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by: [fps-flink-taskmanager-2384273947-9n4kc] 2017-10-06 06:09:41,179 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by: [fps-flink-taskmanager-2384273947-9n4kc: Name does not resolve] 2017-10-06 06:09:51,174 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by: [fps-flink-taskmanager-2384273947-9n4kc] 2017-10-06 06:09:57,475 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.] 2017-10-06 06:10:01,179 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by: [fps-flink-taskmanager-2384273947-9n4kc: Name does not resolve] 2017-10-06 06:10:06,173 WARN akka.remote.RemoteWatcher - Detected unreachable: [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] 2017-10-06 06:10:06,177 INFO org.apache.flink.runtime.jobmanager.JobManager - Task manager akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274/user/taskmanager terminated. java.lang.Exception: TaskManager was lost/killed: 55d3143ccecec7878f7df169208795d0 @ fps-flink-taskmanager-2384273947-9n4kc (dataPort=37448) java.lang.Exception: TaskManager was lost/killed: 55d3143ccecec7878f7df169208795d0 @ fps-flink-taskmanager-2384273947-9n4kc (dataPort=37448) 2017-10-06 06:10:06,188 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by: [fps-flink-taskmanager-2384273947-9n4kc] 2017-10-06 06:10:06,240 INFO org.apache.flink.runtime.instance.InstanceManager - Unregistered task manager fps-flink-taskmanager-2384273947-9n4kc/ 10.225.132.78. Number of registered task managers 3. Number of available slots 3. 2017-10-06 06:10:16,247 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by: [fps-flink-taskmanager-2384273947-9n4kc: Name does not resolve] 2017-10-06 06:10:26,284 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]] Caused by: [fps-flink-taskmanager-2384273947-9n4kc: Name does not resolve] 2017-10-06 06:10:27,495 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@fps-flink-taskmanager-2384273947-9n4kc:40274]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.]