NiFi Cluster with lots of SUSPENDED, RECONNECTED, LOST events

ddewaele Tue, 13 Jun 2017 14:52:01 -0700

We have a node nifi cluster running with 3 zookeeper instances (replicated)
in a Docker Swarm Cluster.


Most of time the cluster is operating fine, but from time to time we notice
that Nifi stops processing messages completely. It eventually resumes after
a while (sometimes after a couple of seconds, sometimes after a couple of
minutes).

When I do a grep o.a.n.c.l.e.CuratorLeaderElectionManager
/srv/nifi/logs/nifi-app.log on the primary node, I see a lof of suspended /
reconnected messages.




Likewise on the other node, I see similar messages



The only real exceptions I'm seeing in the logs are these



I also this on the UI from time to time :

com.sun.jersey.api.client.ClientHandlerException:
java.net.SocketTimeoutException: Read timed out

Is there anything I can do to further debug this ? 
Is it normal to see that many connection state changes ? (the logs are full
of them).
The solution is running on 3 VMs, using Docker Swarm. Nifi is running on 2
of those 3 VMs. We have a zookeeper setup running on all 3 VMs.

I don't see any errors in the zookeeper logs.






--
View this message in context: 
http://apache-nifi-users-list.2361937.n4.nabble.com/NiFi-Cluster-with-lots-of-SUSPENDED-RECONNECTED-LOST-events-tp2194.html
Sent from the Apache NiFi Users List mailing list archive at Nabble.com.

NiFi Cluster with lots of SUSPENDED, RECONNECTED, LOST events

Reply via email to