Hello Wyll It may be helpful if you can send nifi.properties.
Thanks Sushil Kumar On Tue, Sep 29, 2020 at 7:58 AM Wyll Ingersoll < [email protected]> wrote: > > I have a 3-node Nifi (1.11.4) cluster in kubernetes environment (as a > StatefulSet) using external zookeeper (3 nodes also) to manage state. > > Whenever even 1 node (pod/container) goes down or is restarted, it can > throw the whole cluster into a bad state that forces me to restart ALL of > the pods in order to recover. This seems wrong. The problem seems to be > that when the primary node goes away, the remaining 2 nodes don't ever try > to take over. Instead, I have restart all of them individually until one > of them becomes the primary, then the other 2 eventually join and sync up. > > When one of the nodes is refusing to sync up, I often see these errors in > the log and the only way to get it back into the cluster is to restart it. > The node showing the errors below never seems to be able to rejoin or > resync with the other 2 nodes. > > > 2020-09-29 10:18:53,324 ERROR [Reconnect to Cluster] > o.a.nifi.controller.StandardFlowService Handling reconnection request > failed due to: org.apache.nifi.cluster.ConnectionException: Failed to > connect node to cluster due to: java.lang.NullPointerException > > org.apache.nifi.cluster.ConnectionException: Failed to connect node to > cluster due to: java.lang.NullPointerException > > at > org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1035) > > at > org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:668) > > at > org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:109) > > at > org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:415) > > at java.lang.Thread.run(Thread.java:748) > > Caused by: java.lang.NullPointerException: null > > at > org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:989) > > ... 4 common frames omitted > > 2020-09-29 10:18:53,326 INFO [Reconnect to Cluster] > o.a.c.f.imps.CuratorFrameworkImpl Starting > > 2020-09-29 10:18:53,327 INFO [Reconnect to Cluster] > org.apache.zookeeper.ClientCnxnSocket jute.maxbuffer value is 4194304 Bytes > > 2020-09-29 10:18:53,328 INFO [Reconnect to Cluster] > o.a.c.f.imps.CuratorFrameworkImpl Default schema > > 2020-09-29 10:18:53,807 INFO [Reconnect to Cluster-EventThread] > o.a.c.f.state.ConnectionStateManager State change: CONNECTED > > 2020-09-29 10:18:53,809 INFO [Reconnect to Cluster-EventThread] > o.a.c.framework.imps.EnsembleTracker New config event received: > {server.1=zk-0.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181, version=0, > server.3=zk-2.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181, > server.2=zk-1.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181} > > 2020-09-29 10:18:53,810 INFO [Curator-Framework-0] > o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting > > 2020-09-29 10:18:53,813 INFO [Reconnect to Cluster-EventThread] > o.a.c.framework.imps.EnsembleTracker New config event received: > {server.1=zk-0.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181, version=0, > server.3=zk-2.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181, > server.2=zk-1.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181} > > 2020-09-29 10:18:54,323 INFO [Reconnect to Cluster] > o.a.n.c.l.e.CuratorLeaderElectionManager Cannot unregister Leader Election > Role 'Primary Node' becuase that role is not registered > > 2020-09-29 10:18:54,324 INFO [Reconnect to Cluster] > o.a.n.c.l.e.CuratorLeaderElectionManager Cannot unregister Leader Election > Role 'Cluster Coordinator' becuase that role is not registered > >
