Hello, Just looking through this thread now. I believe that I understand the problem. I have updated the JIRA with details about what I think is the problem and a potential remedy for the problem.
Thanks -Mark > On May 18, 2017, at 9:49 AM, Matt Gilman <[email protected]> wrote: > > Thanks for the additional details. They will be helpful when working the > JIRA. All nodes, including the coordinator, heartbeat to the active > coordinator. This means that the coordinator effectively heartbeats to > itself. It appears, based on your log messages, that this is not happening. > Because no heartbeats were receive from any node, the lack of heartbeats from > the terminated node is not considered. > > Matt > > Sent from my iPhone > >> On May 18, 2017, at 8:30 AM, ddewaele <[email protected]> wrote: >> >> Found something interesting in the centos-b debug logging.... >> >> after centos-a (the coordinator) is killed centos-b takes over. Notice how >> it "Will not disconnect any nodes due to lack of heartbeat" and how it still >> sees centos-a as connected despite the fact that there are no heartbeats >> anymore. >> >> 2017-05-18 12:41:38,010 INFO [Leader Election Notification Thread-2] >> o.apache.nifi.controller.FlowController This node elected Active Cluster >> Coordinator >> 2017-05-18 12:41:38,010 DEBUG [Leader Election Notification Thread-2] >> o.a.n.c.c.h.ClusterProtocolHeartbeatMonitor Purging old heartbeats >> 2017-05-18 12:41:38,014 INFO [Leader Election Notification Thread-1] >> o.apache.nifi.controller.FlowController This node has been elected Primary >> Node >> 2017-05-18 12:41:38,353 DEBUG [Heartbeat Monitor Thread-1] >> o.a.n.c.c.h.AbstractHeartbeatMonitor Received no new heartbeats. Will not >> disconnect any nodes due to lack of heartbeat >> 2017-05-18 12:41:41,336 DEBUG [Process Cluster Protocol Request-3] >> o.a.n.c.c.h.ClusterProtocolHeartbeatMonitor Received new heartbeat from >> centos-b:8080 >> 2017-05-18 12:41:41,337 DEBUG [Process Cluster Protocol Request-3] >> o.a.n.c.c.h.ClusterProtocolHeartbeatMonitor >> >> Calculated diff between current cluster status and node cluster status as >> follows: >> Node: [NodeConnectionStatus[nodeId=centos-b:8080, state=CONNECTED, >> updateId=45], NodeConnectionStatus[nodeId=centos-a:8080, state=CONNECTED, >> updateId=42]] >> Self: [NodeConnectionStatus[nodeId=centos-b:8080, state=CONNECTED, >> updateId=45], NodeConnectionStatus[nodeId=centos-a:8080, state=CONNECTED, >> updateId=42]] >> Difference: [] >> >> >> 2017-05-18 12:41:41,337 INFO [Process Cluster Protocol Request-3] >> o.a.n.c.p.impl.SocketProtocolListener Finished processing request >> 410e7db5-8bb0-4f97-8ee8-fc8647c54959 (type=HEARTBEAT, length=2341 bytes) >> from centos-b:8080 in 3 millis >> 2017-05-18 12:41:41,339 INFO [Clustering Tasks Thread-2] >> o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2017-05-18 >> 12:41:41,330 and sent to centos-b:10001 at 2017-05-18 12:41:41,339; send >> took 8 millis >> 2017-05-18 12:41:43,354 INFO [Heartbeat Monitor Thread-1] >> o.a.n.c.c.h.AbstractHeartbeatMonitor Finished processing 1 heartbeats in >> 93276 nanos >> 2017-05-18 12:41:46,346 DEBUG [Process Cluster Protocol Request-4] >> o.a.n.c.c.h.ClusterProtocolHeartbeatMonitor Received new heartbeat from >> centos-b:8080 >> 2017-05-18 12:41:46,346 DEBUG [Process Cluster Protocol Request-4] >> o.a.n.c.c.h.ClusterProtocolHeartbeatMonitor >> >> Calculated diff between current cluster status and node cluster status as >> follows: >> Node: [NodeConnectionStatus[nodeId=centos-b:8080, state=CONNECTED, >> updateId=45], NodeConnectionStatus[nodeId=centos-a:8080, state=CONNECTED, >> updateId=42]] >> Self: [NodeConnectionStatus[nodeId=centos-b:8080, state=CONNECTED, >> updateId=45], NodeConnectionStatus[nodeId=centos-a:8080, state=CONNECTED, >> updateId=42]] >> Difference: [] >> >> >> >> >> -- >> View this message in context: >> http://apache-nifi-users-list.2361937.n4.nabble.com/Nifi-Cluster-fails-to-disconnect-node-when-node-was-killed-tp1942p1950.html >> Sent from the Apache NiFi Users List mailing list archive at Nabble.com.
