Hello Wyll

It may be helpful if you can send nifi.properties.

Thanks
Sushil Kumar

On Tue, Sep 29, 2020 at 7:58 AM Wyll Ingersoll <
[email protected]> wrote:

>
> I have a 3-node Nifi (1.11.4) cluster in kubernetes environment (as a
> StatefulSet) using external zookeeper (3 nodes also) to manage state.
>
> Whenever even 1 node (pod/container) goes down or is restarted, it can
> throw the whole cluster into a bad state that forces me to restart ALL of
> the pods in order to recover.  This seems wrong.  The problem seems to be
> that when the primary node goes away, the remaining 2 nodes don't ever try
> to take over.  Instead, I have restart all of them individually until one
> of them becomes the primary, then the other 2 eventually join and sync up.
>
> When one of the nodes is refusing to sync up, I often see these errors in
> the log and the only way to get it back into the cluster is to restart it.
> The node showing the errors below never seems to be able to rejoin or
> resync with the other 2 nodes.
>
>
> 2020-09-29 10:18:53,324 ERROR [Reconnect to Cluster]
> o.a.nifi.controller.StandardFlowService Handling reconnection request
> failed due to: org.apache.nifi.cluster.ConnectionException: Failed to
> connect node to cluster due to: java.lang.NullPointerException
>
> org.apache.nifi.cluster.ConnectionException: Failed to connect node to
> cluster due to: java.lang.NullPointerException
>
> at
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1035)
>
> at
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:668)
>
> at
> org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:109)
>
> at
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:415)
>
> at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.lang.NullPointerException: null
>
> at
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:989)
>
> ... 4 common frames omitted
>
> 2020-09-29 10:18:53,326 INFO [Reconnect to Cluster]
> o.a.c.f.imps.CuratorFrameworkImpl Starting
>
> 2020-09-29 10:18:53,327 INFO [Reconnect to Cluster]
> org.apache.zookeeper.ClientCnxnSocket jute.maxbuffer value is 4194304 Bytes
>
> 2020-09-29 10:18:53,328 INFO [Reconnect to Cluster]
> o.a.c.f.imps.CuratorFrameworkImpl Default schema
>
> 2020-09-29 10:18:53,807 INFO [Reconnect to Cluster-EventThread]
> o.a.c.f.state.ConnectionStateManager State change: CONNECTED
>
> 2020-09-29 10:18:53,809 INFO [Reconnect to Cluster-EventThread]
> o.a.c.framework.imps.EnsembleTracker New config event received:
> {server.1=zk-0.zk-hs.ki.svc.cluster.local:2888:3888:participant;
> 0.0.0.0:2181, version=0,
> server.3=zk-2.zk-hs.ki.svc.cluster.local:2888:3888:participant;
> 0.0.0.0:2181,
> server.2=zk-1.zk-hs.ki.svc.cluster.local:2888:3888:participant;
> 0.0.0.0:2181}
>
> 2020-09-29 10:18:53,810 INFO [Curator-Framework-0]
> o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting
>
> 2020-09-29 10:18:53,813 INFO [Reconnect to Cluster-EventThread]
> o.a.c.framework.imps.EnsembleTracker New config event received:
> {server.1=zk-0.zk-hs.ki.svc.cluster.local:2888:3888:participant;
> 0.0.0.0:2181, version=0,
> server.3=zk-2.zk-hs.ki.svc.cluster.local:2888:3888:participant;
> 0.0.0.0:2181,
> server.2=zk-1.zk-hs.ki.svc.cluster.local:2888:3888:participant;
> 0.0.0.0:2181}
>
> 2020-09-29 10:18:54,323 INFO [Reconnect to Cluster]
> o.a.n.c.l.e.CuratorLeaderElectionManager Cannot unregister Leader Election
> Role 'Primary Node' becuase that role is not registered
>
> 2020-09-29 10:18:54,324 INFO [Reconnect to Cluster]
> o.a.n.c.l.e.CuratorLeaderElectionManager Cannot unregister Leader Election
> Role 'Cluster Coordinator' becuase that role is not registered
>
>

Reply via email to