Re: Nifi Cluster fails to disconnect node when node was killed

Matt Gilman Thu, 18 May 2017 06:49:36 -0700

Thanks for the additional details. They will be helpful when working the JIRA. 
All nodes, including the coordinator, heartbeat to the active coordinator. This 
means that the coordinator effectively heartbeats to itself. It appears, based 
on your log messages, that this is not happening. Because no heartbeats were 
receive from any node, the lack of heartbeats from the terminated node is not 
considered.


Matt

Sent from my iPhone

> On May 18, 2017, at 8:30 AM, ddewaele <[email protected]> wrote:
> 
> Found something interesting in the centos-b debug logging.... 
> 
> after centos-a (the coordinator) is killed centos-b takes over. Notice how
> it "Will not disconnect any nodes due to lack of heartbeat" and how it still
> sees centos-a as connected despite the fact that there are no heartbeats
> anymore.
> 
> 2017-05-18 12:41:38,010 INFO [Leader Election Notification Thread-2]
> o.apache.nifi.controller.FlowController This node elected Active Cluster
> Coordinator
> 2017-05-18 12:41:38,010 DEBUG [Leader Election Notification Thread-2]
> o.a.n.c.c.h.ClusterProtocolHeartbeatMonitor Purging old heartbeats
> 2017-05-18 12:41:38,014 INFO [Leader Election Notification Thread-1]
> o.apache.nifi.controller.FlowController This node has been elected Primary
> Node
> 2017-05-18 12:41:38,353 DEBUG [Heartbeat Monitor Thread-1]
> o.a.n.c.c.h.AbstractHeartbeatMonitor Received no new heartbeats. Will not
> disconnect any nodes due to lack of heartbeat
> 2017-05-18 12:41:41,336 DEBUG [Process Cluster Protocol Request-3]
> o.a.n.c.c.h.ClusterProtocolHeartbeatMonitor Received new heartbeat from
> centos-b:8080
> 2017-05-18 12:41:41,337 DEBUG [Process Cluster Protocol Request-3]
> o.a.n.c.c.h.ClusterProtocolHeartbeatMonitor 
> 
> Calculated diff between current cluster status and node cluster status as
> follows:
> Node: [NodeConnectionStatus[nodeId=centos-b:8080, state=CONNECTED,
> updateId=45], NodeConnectionStatus[nodeId=centos-a:8080, state=CONNECTED,
> updateId=42]]
> Self: [NodeConnectionStatus[nodeId=centos-b:8080, state=CONNECTED,
> updateId=45], NodeConnectionStatus[nodeId=centos-a:8080, state=CONNECTED,
> updateId=42]]
> Difference: []
> 
> 
> 2017-05-18 12:41:41,337 INFO [Process Cluster Protocol Request-3]
> o.a.n.c.p.impl.SocketProtocolListener Finished processing request
> 410e7db5-8bb0-4f97-8ee8-fc8647c54959 (type=HEARTBEAT, length=2341 bytes)
> from centos-b:8080 in 3 millis
> 2017-05-18 12:41:41,339 INFO [Clustering Tasks Thread-2]
> o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2017-05-18
> 12:41:41,330 and sent to centos-b:10001 at 2017-05-18 12:41:41,339; send
> took 8 millis
> 2017-05-18 12:41:43,354 INFO [Heartbeat Monitor Thread-1]
> o.a.n.c.c.h.AbstractHeartbeatMonitor Finished processing 1 heartbeats in
> 93276 nanos
> 2017-05-18 12:41:46,346 DEBUG [Process Cluster Protocol Request-4]
> o.a.n.c.c.h.ClusterProtocolHeartbeatMonitor Received new heartbeat from
> centos-b:8080
> 2017-05-18 12:41:46,346 DEBUG [Process Cluster Protocol Request-4]
> o.a.n.c.c.h.ClusterProtocolHeartbeatMonitor 
> 
> Calculated diff between current cluster status and node cluster status as
> follows:
> Node: [NodeConnectionStatus[nodeId=centos-b:8080, state=CONNECTED,
> updateId=45], NodeConnectionStatus[nodeId=centos-a:8080, state=CONNECTED,
> updateId=42]]
> Self: [NodeConnectionStatus[nodeId=centos-b:8080, state=CONNECTED,
> updateId=45], NodeConnectionStatus[nodeId=centos-a:8080, state=CONNECTED,
> updateId=42]]
> Difference: []
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-nifi-users-list.2361937.n4.nabble.com/Nifi-Cluster-fails-to-disconnect-node-when-node-was-killed-tp1942p1950.html
> Sent from the Apache NiFi Users List mailing list archive at Nabble.com.

Re: Nifi Cluster fails to disconnect node when node was killed

Reply via email to