Yes, 'nifi.cluster.protocol.is.secure' is set to 'true', since otherwise, NiFi would require values for 'nifi.web.http.host' and 'nifi.web.http.port'. We have a cert that is used to serve HTTPS requests to the NiFi web UI, and it works just fine.
-----Original Message----- From: Bryan Bende <[email protected]> Sent: Monday, October 15, 2018 9:43 AM To: [email protected] Subject: Re: NiFi fails on cluster nodes This is not related to ZooKeeper... I think you are missing something related to TLS/SSL configuration, maybe you set cluster protocol to be secure, but then you didn't configure NiFi with a keystore/truststore? On Mon, Oct 15, 2018 at 9:41 AM Mike Thomsen <[email protected]> wrote: > > Not sure what's going on here, but NiFi does not require a cert to setup > ZooKeeper. > > Mike > > On Mon, Oct 15, 2018 at 9:39 AM Saip, Alexander (NIH/CC/BTRIS) [C] > <[email protected]> wrote: >> >> Hi Mike and Bryan, >> >> >> >> I’ve installed and started ZooKeeper 3.4.13 and re-started a single NiFi >> node so far. Here is the error from the NiFi log: >> >> >> >> 2018-10-15 09:19:48,371 ERROR [Process Cluster Protocol Request-1] >> o.a.nifi.security.util.CertificateUtils The incoming request did not >> contain client certificates and thus the DN cannot be extracted. >> Check that the other endpoint is providing a complete client >> certificate chain >> >> 2018-10-15 09:19:48,425 INFO [main] >> o.a.nifi.controller.StandardFlowService Connecting Node: 0.0.0.0:8008 >> >> 2018-10-15 09:19:48,452 ERROR [Process Cluster Protocol Request-2] >> o.a.nifi.security.util.CertificateUtils The incoming request did not >> contain client certificates and thus the DN cannot be extracted. >> Check that the other endpoint is providing a complete client >> certificate chain >> >> 2018-10-15 09:19:48,456 WARN [main] >> o.a.nifi.controller.StandardFlowService Failed to connect to cluster >> due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed >> marshalling 'CONNECTION_REQUEST' protocol message due to: >> javax.net.ssl.SSLHandshakeException: Received fatal alert: >> bad_certificate >> >> >> >> It is likely extraneous to NiFi, but does this mean that we need install a >> cert into ZooKeeper? Right now, both apps are running on the same box. >> >> >> >> Thank you. >> >> >> >> From: Mike Thomsen <[email protected]> >> Sent: Monday, October 15, 2018 9:02 AM >> To: [email protected] >> Subject: Re: NiFi fails on cluster nodes >> >> >> >> http://nifi.apache.org/docs/nifi-docs/html/administration-guide.html >> >> >> >> See the properties that start with "nifi.zookeeper." >> >> >> >> On Mon, Oct 15, 2018 at 8:58 AM Saip, Alexander (NIH/CC/BTRIS) [C] >> <[email protected]> wrote: >> >> Mike, >> >> >> >> I wonder if you could point me to instructions how to configure a cluster >> with an external instance of ZooKeeper? The NiFi Admin Guide talks >> exclusively about the embedded one. >> >> >> >> Thanks again. >> >> >> >> From: Mike Thomsen <[email protected]> >> Sent: Friday, October 12, 2018 10:17 AM >> To: [email protected] >> Subject: Re: NiFi fails on cluster nodes >> >> >> >> It very well could become a problem down the road. The reason ZooKeeper is >> usually on a dedicated machine is that you want it to be able to have enough >> resources to always communicate within a quorum to reconcile configuration >> changes and feed configuration details to clients. >> >> >> >> That particular message is just a warning message. From what I can tell, >> it's just telling you that no cluster coordinator has been elected and it's >> going to try to do something about that. It's usually a problem with >> embedded ZooKeeper because each node by default points to the version of >> ZooKeeper it fires up. >> >> >> >> For a development environment, a VM with 2GB of RAM and 1-2 CPU cores should >> be enough to run an external ZooKeeper. >> >> >> >> On Fri, Oct 12, 2018 at 9:47 AM Saip, Alexander (NIH/CC/BTRIS) [C] >> <[email protected]> wrote: >> >> Thanks Mike. We will get an external ZooKeeper instance deployed. I guess >> co-locating it with one of the NiFi nodes shouldn’t be an issue, or will it? >> We are chronically short of hardware. BTW, does the following message in the >> logs point to some sort of problem with the embedded ZooKeeper? >> >> >> >> 2018-10-12 08:21:35,838 WARN [main] >> o.a.nifi.controller.StandardFlowService There is currently no Cluster >> Coordinator. This often happens upon restart of NiFi when running an >> embedded ZooKeeper. Will register this node to become the active >> Cluster Coordinator and will attempt to connect to cluster again >> >> 2018-10-12 08:21:35,838 INFO [main] >> o.a.n.c.l.e.CuratorLeaderElectionManager >> CuratorLeaderElectionManager[stopped=false] Attempted to register >> Leader Election for role 'Cluster Coordinator' but this role is >> already registered >> >> 2018-10-12 08:21:42,090 INFO [Curator-Framework-0] >> o.a.c.f.state.ConnectionStateManager State change: SUSPENDED >> >> 2018-10-12 08:21:42,092 INFO [Curator-ConnectionStateManager-0] >> o.a.n.c.l.e.CuratorLeaderElectionManager >> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManag >> er$ElectionListener@17900f5b Connection State changed to SUSPENDED >> >> >> >> From: Mike Thomsen <[email protected]> >> Sent: Friday, October 12, 2018 8:33 AM >> To: [email protected] >> Subject: Re: NiFi fails on cluster nodes >> >> >> >> Also, in a production environment NiFi should have its own dedicated >> ZooKeeper cluster to be on the safe side. You should not reuse ZooKeeper >> quora (ex. have HBase and NiFi point to the same quorum). >> >> >> >> On Fri, Oct 12, 2018 at 8:29 AM Mike Thomsen <[email protected]> wrote: >> >> Alexander, >> >> >> >> I am pretty sure your problem is here: >> nifi.state.management.embedded.zookeeper.start=true >> >> >> >> That spins up an embedded ZooKeeper, which is generally intended to be used >> for local development. For example, HBase provides the same feature, but it >> is intended to allow you to test a real HBase client application against a >> single node of HBase running locally. >> >> >> >> What you need to try is these steps: >> >> >> >> 1. Set up an external ZooKeeper instance (or set up 3 in a quorum; >> must be odd numbers) >> >> 2. Update nifi.properties on each node to use the external ZooKeeper setup. >> >> 3. Restart all of them. >> >> >> >> See if that works. >> >> >> >> Mike >> >> >> >> On Fri, Oct 12, 2018 at 8:13 AM Saip, Alexander (NIH/CC/BTRIS) [C] >> <[email protected]> wrote: >> >> nifi.cluster.node.protocol.port=11443 by default on all nodes, I haven’t >> touched that property. Yesterday, we discovered some issues preventing two >> of the boxes from communicating. Now, they can talk okay. Ports 11443, 2181 >> and 3888 are explicitly open in iptables, but clustering still doesn’t >> happen. The log files are filled up with errors like this: >> >> >> >> 2018-10-12 07:59:08,494 ERROR [Curator-Framework-0] >> o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up >> >> org.apache.zookeeper.KeeperException$ConnectionLossException: >> KeeperErrorCode = ConnectionLoss >> >> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:99) >> >> at >> org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroun >> dRetry(CuratorFrameworkImpl.java:728) >> >> at >> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgro >> undOperation(CuratorFrameworkImpl.java:857) >> >> at >> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOper >> ationsLoop(CuratorFrameworkImpl.java:809) >> >> at >> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(Cur >> atorFrameworkImpl.java:64) >> >> at >> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(Curator >> FrameworkImpl.java:267) >> >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask. >> access$201(ScheduledThreadPoolExecutor.java:180) >> >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask. >> run(ScheduledThreadPoolExecutor.java:293) >> >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. >> java:1149) >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor >> .java:624) >> >> at java.lang.Thread.run(Thread.java:748) >> >> >> >> Is there anything else we should check? >> >> >> >> From: Nathan Gough <[email protected]> >> Sent: Thursday, October 11, 2018 9:12 AM >> To: [email protected] >> Subject: Re: NiFi fails on cluster nodes >> >> >> >> You may also need to explicitly open ‘nifi.cluster.node.protocol.port’ on >> all nodes to allow cluster communication for cluster heartbeats etc. >> >> >> >> From: ashmeet kandhari <[email protected]> >> Reply-To: <[email protected]> >> Date: Thursday, October 11, 2018 at 9:09 AM >> To: <[email protected]> >> Subject: Re: NiFi fails on cluster nodes >> >> >> >> Hi Alexander, >> >> >> >> Can you verify by pinging if the 3 nodes (tcp ping) or run nifi in >> standalone mode and see if you can ping them from other 2 servers just to be >> sure if they can communicate with one another. >> >> >> >> On Thu, Oct 11, 2018 at 11:49 AM Saip, Alexander (NIH/CC/BTRIS) [C] >> <[email protected]> wrote: >> >> How do I do that? The nifi.properties file on each node includes >> ‘nifi.state.management.embedded.zookeeper.start=true’, so I assume Zookeeper >> does start. >> >> >> >> From: ashmeet kandhari <[email protected]> >> Sent: Thursday, October 11, 2018 4:36 AM >> To: [email protected] >> Subject: Re: NiFi fails on cluster nodes >> >> >> >> Can you see if zookeeper node is up and running and can connect to >> the nifi nodes >> >> >> >> On Wed, Oct 10, 2018 at 7:34 PM Saip, Alexander (NIH/CC/BTRIS) [C] >> <[email protected]> wrote: >> >> Hello, >> >> >> >> We have three NiFi 1.7.1 nodes originally configured as independent >> instances, each on its own server. There is no firewall between them. When I >> tried to build a cluster following instructions here, NiFi failed to start >> on all of them, despite the fact that I even set >> nifi.cluster.protocol.is.secure=false in the nifi.properties file on each >> node. Here is the error in the log files: >> >> >> >> 2018-10-10 13:57:07,506 INFO [main] org.apache.nifi.NiFi Launching NiFi... >> >> 2018-10-10 13:57:07,745 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader >> Determined default nifi.properties path to be >> '/opt/nifi-1.7.1/./conf/nifi.properties' >> >> 2018-10-10 13:57:07,748 INFO [main] >> o.a.nifi.properties.NiFiPropertiesLoader Loaded 125 properties from >> /opt/nifi-1.7.1/./conf/nifi.properties >> >> 2018-10-10 13:57:07,755 INFO [main] org.apache.nifi.NiFi Loaded 125 >> properties >> >> 2018-10-10 13:57:07,762 INFO [main] org.apache.nifi.BootstrapListener >> Started Bootstrap Listener, Listening for incoming requests on port >> 43744 >> >> 2018-10-10 13:59:15,056 ERROR [main] org.apache.nifi.NiFi Failure to >> launch NiFi due to java.net.ConnectException: Connection timed out >> (Connection timed out) >> >> java.net.ConnectException: Connection timed out (Connection timed >> out) >> >> at java.net.PlainSocketImpl.socketConnect(Native Method) >> >> at >> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.ja >> va:350) >> >> at >> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocket >> Impl.java:206) >> >> at >> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java >> :188) >> >> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) >> >> at java.net.Socket.connect(Socket.java:589) >> >> at java.net.Socket.connect(Socket.java:538) >> >> at >> org.apache.nifi.BootstrapListener.sendCommand(BootstrapListener.java: >> 100) >> >> at >> org.apache.nifi.BootstrapListener.start(BootstrapListener.java:83) >> >> at org.apache.nifi.NiFi.<init>(NiFi.java:102) >> >> at org.apache.nifi.NiFi.<init>(NiFi.java:71) >> >> at org.apache.nifi.NiFi.main(NiFi.java:292) >> >> 2018-10-10 13:59:15,058 INFO [Thread-1] org.apache.nifi.NiFi Initiating >> shutdown of Jetty web server... >> >> 2018-10-10 13:59:15,059 INFO [Thread-1] org.apache.nifi.NiFi Jetty web >> server shutdown completed (nicely or otherwise). >> >> >> >> Without clustering, the instances had no problem starting. Since this is our >> first experiment building a cluster, I’m not sure where to look for clues. >> >> >> >> Thanks in advance, >> >> >> >> Alexander
