Re: NiFi fails on cluster nodes

Bryan Bende Mon, 15 Oct 2018 06:03:38 -0700

The cluster configuration section of the admin guide [1] is
independent of whether it is embedded or external zookeeper.


The only real difference is you won't set
nifi.state.management.embedded.zookeeper.start=true, but besides that
you all of the other config would be the same whether using embedded
or external.

[1] 
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#clustering
On Mon, Oct 15, 2018 at 8:58 AM Saip, Alexander (NIH/CC/BTRIS) [C]
<alexander.s...@nih.gov> wrote:
>
> Mike,
>
>
>
> I wonder if you could point me to instructions how to configure a cluster 
> with an external instance of ZooKeeper? The NiFi Admin Guide talks 
> exclusively about the embedded one.
>
>
>
> Thanks again.
>
>
>
> From: Mike Thomsen <mikerthom...@gmail.com>
> Sent: Friday, October 12, 2018 10:17 AM
> To: users@nifi.apache.org
> Subject: Re: NiFi fails on cluster nodes
>
>
>
> It very well could become a problem down the road. The reason ZooKeeper is 
> usually on a dedicated machine is that you want it to be able to have enough 
> resources to always communicate within a quorum to reconcile configuration 
> changes and feed configuration details to clients.
>
>
>
> That particular message is just a warning message. From what I can tell, it's 
> just telling you that no cluster coordinator has been elected and it's going 
> to try to do something about that. It's usually a problem with embedded 
> ZooKeeper because each node by default points to the version of ZooKeeper it 
> fires up.
>
>
>
> For a development environment, a VM with 2GB of RAM and 1-2 CPU cores should 
> be enough to run an external ZooKeeper.
>
>
>
> On Fri, Oct 12, 2018 at 9:47 AM Saip, Alexander (NIH/CC/BTRIS) [C] 
> <alexander.s...@nih.gov> wrote:
>
> Thanks Mike. We will get an external ZooKeeper instance deployed. I guess 
> co-locating it with one of the NiFi nodes shouldn’t be an issue, or will it? 
> We are chronically short of hardware. BTW, does the following message in the 
> logs point to some sort of problem with the embedded ZooKeeper?
>
>
>
> 2018-10-12 08:21:35,838 WARN [main] o.a.nifi.controller.StandardFlowService 
> There is currently no Cluster Coordinator. This often happens upon restart of 
> NiFi when running an embedded ZooKeeper. Will register this node to become 
> the active Cluster Coordinator and will attempt to connect to cluster again
>
> 2018-10-12 08:21:35,838 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager 
> CuratorLeaderElectionManager[stopped=false] Attempted to register Leader 
> Election for role 'Cluster Coordinator' but this role is already registered
>
> 2018-10-12 08:21:42,090 INFO [Curator-Framework-0] 
> o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
>
> 2018-10-12 08:21:42,092 INFO [Curator-ConnectionStateManager-0] 
> o.a.n.c.l.e.CuratorLeaderElectionManager 
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@17900f5b
>  Connection State changed to SUSPENDED
>
>
>
> From: Mike Thomsen <mikerthom...@gmail.com>
> Sent: Friday, October 12, 2018 8:33 AM
> To: users@nifi.apache.org
> Subject: Re: NiFi fails on cluster nodes
>
>
>
> Also, in a production environment NiFi should have its own dedicated 
> ZooKeeper cluster to be on the safe side. You should not reuse ZooKeeper 
> quora (ex. have HBase and NiFi point to the same quorum).
>
>
>
> On Fri, Oct 12, 2018 at 8:29 AM Mike Thomsen <mikerthom...@gmail.com> wrote:
>
> Alexander,
>
>
>
> I am pretty sure your problem is here: 
> nifi.state.management.embedded.zookeeper.start=true
>
>
>
> That spins up an embedded ZooKeeper, which is generally intended to be used 
> for local development. For example, HBase provides the same feature, but it 
> is intended to allow you to test a real HBase client application against a 
> single node of HBase running locally.
>
>
>
> What you need to try is these steps:
>
>
>
> 1. Set up an external ZooKeeper instance (or set up 3 in a quorum; must be 
> odd numbers)
>
> 2. Update nifi.properties on each node to use the external ZooKeeper setup.
>
> 3. Restart all of them.
>
>
>
> See if that works.
>
>
>
> Mike
>
>
>
> On Fri, Oct 12, 2018 at 8:13 AM Saip, Alexander (NIH/CC/BTRIS) [C] 
> <alexander.s...@nih.gov> wrote:
>
> nifi.cluster.node.protocol.port=11443 by default on all nodes, I haven’t 
> touched that property. Yesterday, we discovered some issues preventing two of 
> the boxes from communicating. Now, they can talk okay. Ports 11443, 2181 and 
> 3888 are explicitly open in iptables, but clustering still doesn’t happen. 
> The log files are filled up with errors like this:
>
>
>
> 2018-10-12 07:59:08,494 ERROR [Curator-Framework-0] 
> o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
>
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss
>
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>
>         at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
>
>         at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
>
>         at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>
>         at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>
>         at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
>         at java.lang.Thread.run(Thread.java:748)
>
>
>
> Is there anything else we should check?
>
>
>
> From: Nathan Gough <thena...@gmail.com>
> Sent: Thursday, October 11, 2018 9:12 AM
> To: users@nifi.apache.org
> Subject: Re: NiFi fails on cluster nodes
>
>
>
> You may also need to explicitly open ‘nifi.cluster.node.protocol.port’ on all 
> nodes to allow cluster communication for cluster heartbeats etc.
>
>
>
> From: ashmeet kandhari <ashmeetkandhar...@gmail.com>
> Reply-To: <users@nifi.apache.org>
> Date: Thursday, October 11, 2018 at 9:09 AM
> To: <users@nifi.apache.org>
> Subject: Re: NiFi fails on cluster nodes
>
>
>
> Hi Alexander,
>
>
>
> Can you verify by pinging if the 3 nodes (tcp ping) or run nifi in standalone 
> mode and see if you can ping them from other 2 servers just to be sure if 
> they can communicate with one another.
>
>
>
> On Thu, Oct 11, 2018 at 11:49 AM Saip, Alexander (NIH/CC/BTRIS) [C] 
> <alexander.s...@nih.gov> wrote:
>
> How do I do that? The nifi.properties file on each node includes 
> ‘nifi.state.management.embedded.zookeeper.start=true’, so I assume Zookeeper 
> does start.
>
>
>
> From: ashmeet kandhari <ashmeetkandhar...@gmail.com>
> Sent: Thursday, October 11, 2018 4:36 AM
> To: users@nifi.apache.org
> Subject: Re: NiFi fails on cluster nodes
>
>
>
> Can you see if zookeeper node is up and running and can connect to the nifi 
> nodes
>
>
>
> On Wed, Oct 10, 2018 at 7:34 PM Saip, Alexander (NIH/CC/BTRIS) [C] 
> <alexander.s...@nih.gov> wrote:
>
> Hello,
>
>
>
> We have three NiFi 1.7.1 nodes originally configured as independent 
> instances, each on its own server. There is no firewall between them. When I 
> tried to build a cluster following instructions here, NiFi failed to start on 
> all of them, despite the fact that I even set 
> nifi.cluster.protocol.is.secure=false in the nifi.properties file on each 
> node. Here is the error in the log files:
>
>
>
> 2018-10-10 13:57:07,506 INFO [main] org.apache.nifi.NiFi Launching NiFi...
>
> 2018-10-10 13:57:07,745 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader 
> Determined default nifi.properties path to be 
> '/opt/nifi-1.7.1/./conf/nifi.properties'
>
> 2018-10-10 13:57:07,748 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader 
> Loaded 125 properties from /opt/nifi-1.7.1/./conf/nifi.properties
>
> 2018-10-10 13:57:07,755 INFO [main] org.apache.nifi.NiFi Loaded 125 properties
>
> 2018-10-10 13:57:07,762 INFO [main] org.apache.nifi.BootstrapListener Started 
> Bootstrap Listener, Listening for incoming requests on port 43744
>
> 2018-10-10 13:59:15,056 ERROR [main] org.apache.nifi.NiFi Failure to launch 
> NiFi due to java.net.ConnectException: Connection timed out (Connection timed 
> out)
>
> java.net.ConnectException: Connection timed out (Connection timed out)
>
>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>
>         at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>
>         at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>
>         at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>
>         at java.net.Socket.connect(Socket.java:589)
>
>         at java.net.Socket.connect(Socket.java:538)
>
>         at 
> org.apache.nifi.BootstrapListener.sendCommand(BootstrapListener.java:100)
>
>         at org.apache.nifi.BootstrapListener.start(BootstrapListener.java:83)
>
>         at org.apache.nifi.NiFi.<init>(NiFi.java:102)
>
>         at org.apache.nifi.NiFi.<init>(NiFi.java:71)
>
>         at org.apache.nifi.NiFi.main(NiFi.java:292)
>
> 2018-10-10 13:59:15,058 INFO [Thread-1] org.apache.nifi.NiFi Initiating 
> shutdown of Jetty web server...
>
> 2018-10-10 13:59:15,059 INFO [Thread-1] org.apache.nifi.NiFi Jetty web server 
> shutdown completed (nicely or otherwise).
>
>
>
> Without clustering, the instances had no problem starting. Since this is our 
> first experiment building a cluster, I’m not sure where to look for clues.
>
>
>
> Thanks in advance,
>
>
>
> Alexander

Re: NiFi fails on cluster nodes

Reply via email to