RE: NiFi fails on cluster nodes

Saip, Alexander (NIH/CC/BTRIS) [C] Mon, 15 Oct 2018 06:53:52 -0700

Yes, 'nifi.cluster.protocol.is.secure' is set to 'true', since otherwise, NiFi 
would require values for 'nifi.web.http.host' and 'nifi.web.http.port'. We have 
a cert that is used to serve HTTPS requests to the NiFi web UI, and it works 
just fine.


-----Original Message-----
From: Bryan Bende <[email protected]> 
Sent: Monday, October 15, 2018 9:43 AM
To: [email protected]
Subject: Re: NiFi fails on cluster nodes

This is not related to ZooKeeper... I think you are missing something related 
to TLS/SSL configuration, maybe you set cluster protocol to be secure, but then 
you didn't configure NiFi with a keystore/truststore?

On Mon, Oct 15, 2018 at 9:41 AM Mike Thomsen <[email protected]> wrote:
>
> Not sure what's going on here, but NiFi does not require a cert to setup 
> ZooKeeper.
>
> Mike
>
> On Mon, Oct 15, 2018 at 9:39 AM Saip, Alexander (NIH/CC/BTRIS) [C] 
> <[email protected]> wrote:
>>
>> Hi Mike and Bryan,
>>
>>
>>
>> I’ve installed and started ZooKeeper 3.4.13 and re-started a single NiFi 
>> node so far. Here is the error from the NiFi log:
>>
>>
>>
>> 2018-10-15 09:19:48,371 ERROR [Process Cluster Protocol Request-1] 
>> o.a.nifi.security.util.CertificateUtils The incoming request did not 
>> contain client certificates and thus the DN cannot be extracted. 
>> Check that the other endpoint is providing a complete client 
>> certificate chain
>>
>> 2018-10-15 09:19:48,425 INFO [main] 
>> o.a.nifi.controller.StandardFlowService Connecting Node: 0.0.0.0:8008
>>
>> 2018-10-15 09:19:48,452 ERROR [Process Cluster Protocol Request-2] 
>> o.a.nifi.security.util.CertificateUtils The incoming request did not 
>> contain client certificates and thus the DN cannot be extracted. 
>> Check that the other endpoint is providing a complete client 
>> certificate chain
>>
>> 2018-10-15 09:19:48,456 WARN [main] 
>> o.a.nifi.controller.StandardFlowService Failed to connect to cluster 
>> due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed 
>> marshalling 'CONNECTION_REQUEST' protocol message due to: 
>> javax.net.ssl.SSLHandshakeException: Received fatal alert: 
>> bad_certificate
>>
>>
>>
>> It is likely extraneous to NiFi, but does this mean that we need install a 
>> cert into ZooKeeper? Right now, both apps are running on the same box.
>>
>>
>>
>> Thank you.
>>
>>
>>
>> From: Mike Thomsen <[email protected]>
>> Sent: Monday, October 15, 2018 9:02 AM
>> To: [email protected]
>> Subject: Re: NiFi fails on cluster nodes
>>
>>
>>
>> http://nifi.apache.org/docs/nifi-docs/html/administration-guide.html
>>
>>
>>
>> See the properties that start with "nifi.zookeeper."
>>
>>
>>
>> On Mon, Oct 15, 2018 at 8:58 AM Saip, Alexander (NIH/CC/BTRIS) [C] 
>> <[email protected]> wrote:
>>
>> Mike,
>>
>>
>>
>> I wonder if you could point me to instructions how to configure a cluster 
>> with an external instance of ZooKeeper? The NiFi Admin Guide talks 
>> exclusively about the embedded one.
>>
>>
>>
>> Thanks again.
>>
>>
>>
>> From: Mike Thomsen <[email protected]>
>> Sent: Friday, October 12, 2018 10:17 AM
>> To: [email protected]
>> Subject: Re: NiFi fails on cluster nodes
>>
>>
>>
>> It very well could become a problem down the road. The reason ZooKeeper is 
>> usually on a dedicated machine is that you want it to be able to have enough 
>> resources to always communicate within a quorum to reconcile configuration 
>> changes and feed configuration details to clients.
>>
>>
>>
>> That particular message is just a warning message. From what I can tell, 
>> it's just telling you that no cluster coordinator has been elected and it's 
>> going to try to do something about that. It's usually a problem with 
>> embedded ZooKeeper because each node by default points to the version of 
>> ZooKeeper it fires up.
>>
>>
>>
>> For a development environment, a VM with 2GB of RAM and 1-2 CPU cores should 
>> be enough to run an external ZooKeeper.
>>
>>
>>
>> On Fri, Oct 12, 2018 at 9:47 AM Saip, Alexander (NIH/CC/BTRIS) [C] 
>> <[email protected]> wrote:
>>
>> Thanks Mike. We will get an external ZooKeeper instance deployed. I guess 
>> co-locating it with one of the NiFi nodes shouldn’t be an issue, or will it? 
>> We are chronically short of hardware. BTW, does the following message in the 
>> logs point to some sort of problem with the embedded ZooKeeper?
>>
>>
>>
>> 2018-10-12 08:21:35,838 WARN [main] 
>> o.a.nifi.controller.StandardFlowService There is currently no Cluster 
>> Coordinator. This often happens upon restart of NiFi when running an 
>> embedded ZooKeeper. Will register this node to become the active 
>> Cluster Coordinator and will attempt to connect to cluster again
>>
>> 2018-10-12 08:21:35,838 INFO [main] 
>> o.a.n.c.l.e.CuratorLeaderElectionManager 
>> CuratorLeaderElectionManager[stopped=false] Attempted to register 
>> Leader Election for role 'Cluster Coordinator' but this role is 
>> already registered
>>
>> 2018-10-12 08:21:42,090 INFO [Curator-Framework-0] 
>> o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
>>
>> 2018-10-12 08:21:42,092 INFO [Curator-ConnectionStateManager-0] 
>> o.a.n.c.l.e.CuratorLeaderElectionManager 
>> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManag
>> er$ElectionListener@17900f5b Connection State changed to SUSPENDED
>>
>>
>>
>> From: Mike Thomsen <[email protected]>
>> Sent: Friday, October 12, 2018 8:33 AM
>> To: [email protected]
>> Subject: Re: NiFi fails on cluster nodes
>>
>>
>>
>> Also, in a production environment NiFi should have its own dedicated 
>> ZooKeeper cluster to be on the safe side. You should not reuse ZooKeeper 
>> quora (ex. have HBase and NiFi point to the same quorum).
>>
>>
>>
>> On Fri, Oct 12, 2018 at 8:29 AM Mike Thomsen <[email protected]> wrote:
>>
>> Alexander,
>>
>>
>>
>> I am pretty sure your problem is here: 
>> nifi.state.management.embedded.zookeeper.start=true
>>
>>
>>
>> That spins up an embedded ZooKeeper, which is generally intended to be used 
>> for local development. For example, HBase provides the same feature, but it 
>> is intended to allow you to test a real HBase client application against a 
>> single node of HBase running locally.
>>
>>
>>
>> What you need to try is these steps:
>>
>>
>>
>> 1. Set up an external ZooKeeper instance (or set up 3 in a quorum; 
>> must be odd numbers)
>>
>> 2. Update nifi.properties on each node to use the external ZooKeeper setup.
>>
>> 3. Restart all of them.
>>
>>
>>
>> See if that works.
>>
>>
>>
>> Mike
>>
>>
>>
>> On Fri, Oct 12, 2018 at 8:13 AM Saip, Alexander (NIH/CC/BTRIS) [C] 
>> <[email protected]> wrote:
>>
>> nifi.cluster.node.protocol.port=11443 by default on all nodes, I haven’t 
>> touched that property. Yesterday, we discovered some issues preventing two 
>> of the boxes from communicating. Now, they can talk okay. Ports 11443, 2181 
>> and 3888 are explicitly open in iptables, but clustering still doesn’t 
>> happen. The log files are filled up with errors like this:
>>
>>
>>
>> 2018-10-12 07:59:08,494 ERROR [Curator-Framework-0] 
>> o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
>>
>> org.apache.zookeeper.KeeperException$ConnectionLossException: 
>> KeeperErrorCode = ConnectionLoss
>>
>>         at 
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>
>>         at 
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroun
>> dRetry(CuratorFrameworkImpl.java:728)
>>
>>         at 
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgro
>> undOperation(CuratorFrameworkImpl.java:857)
>>
>>         at 
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOper
>> ationsLoop(CuratorFrameworkImpl.java:809)
>>
>>         at 
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(Cur
>> atorFrameworkImpl.java:64)
>>
>>         at 
>> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(Curator
>> FrameworkImpl.java:267)
>>
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>
>>         at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.
>> access$201(ScheduledThreadPoolExecutor.java:180)
>>
>>         at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.
>> run(ScheduledThreadPoolExecutor.java:293)
>>
>>         at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
>> java:1149)
>>
>>         at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>> .java:624)
>>
>>         at java.lang.Thread.run(Thread.java:748)
>>
>>
>>
>> Is there anything else we should check?
>>
>>
>>
>> From: Nathan Gough <[email protected]>
>> Sent: Thursday, October 11, 2018 9:12 AM
>> To: [email protected]
>> Subject: Re: NiFi fails on cluster nodes
>>
>>
>>
>> You may also need to explicitly open ‘nifi.cluster.node.protocol.port’ on 
>> all nodes to allow cluster communication for cluster heartbeats etc.
>>
>>
>>
>> From: ashmeet kandhari <[email protected]>
>> Reply-To: <[email protected]>
>> Date: Thursday, October 11, 2018 at 9:09 AM
>> To: <[email protected]>
>> Subject: Re: NiFi fails on cluster nodes
>>
>>
>>
>> Hi Alexander,
>>
>>
>>
>> Can you verify by pinging if the 3 nodes (tcp ping) or run nifi in 
>> standalone mode and see if you can ping them from other 2 servers just to be 
>> sure if they can communicate with one another.
>>
>>
>>
>> On Thu, Oct 11, 2018 at 11:49 AM Saip, Alexander (NIH/CC/BTRIS) [C] 
>> <[email protected]> wrote:
>>
>> How do I do that? The nifi.properties file on each node includes 
>> ‘nifi.state.management.embedded.zookeeper.start=true’, so I assume Zookeeper 
>> does start.
>>
>>
>>
>> From: ashmeet kandhari <[email protected]>
>> Sent: Thursday, October 11, 2018 4:36 AM
>> To: [email protected]
>> Subject: Re: NiFi fails on cluster nodes
>>
>>
>>
>> Can you see if zookeeper node is up and running and can connect to 
>> the nifi nodes
>>
>>
>>
>> On Wed, Oct 10, 2018 at 7:34 PM Saip, Alexander (NIH/CC/BTRIS) [C] 
>> <[email protected]> wrote:
>>
>> Hello,
>>
>>
>>
>> We have three NiFi 1.7.1 nodes originally configured as independent 
>> instances, each on its own server. There is no firewall between them. When I 
>> tried to build a cluster following instructions here, NiFi failed to start 
>> on all of them, despite the fact that I even set 
>> nifi.cluster.protocol.is.secure=false in the nifi.properties file on each 
>> node. Here is the error in the log files:
>>
>>
>>
>> 2018-10-10 13:57:07,506 INFO [main] org.apache.nifi.NiFi Launching NiFi...
>>
>> 2018-10-10 13:57:07,745 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader 
>> Determined default nifi.properties path to be 
>> '/opt/nifi-1.7.1/./conf/nifi.properties'
>>
>> 2018-10-10 13:57:07,748 INFO [main] 
>> o.a.nifi.properties.NiFiPropertiesLoader Loaded 125 properties from 
>> /opt/nifi-1.7.1/./conf/nifi.properties
>>
>> 2018-10-10 13:57:07,755 INFO [main] org.apache.nifi.NiFi Loaded 125 
>> properties
>>
>> 2018-10-10 13:57:07,762 INFO [main] org.apache.nifi.BootstrapListener 
>> Started Bootstrap Listener, Listening for incoming requests on port 
>> 43744
>>
>> 2018-10-10 13:59:15,056 ERROR [main] org.apache.nifi.NiFi Failure to 
>> launch NiFi due to java.net.ConnectException: Connection timed out 
>> (Connection timed out)
>>
>> java.net.ConnectException: Connection timed out (Connection timed 
>> out)
>>
>>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>>
>>         at 
>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.ja
>> va:350)
>>
>>         at 
>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocket
>> Impl.java:206)
>>
>>         at 
>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java
>> :188)
>>
>>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>
>>         at java.net.Socket.connect(Socket.java:589)
>>
>>         at java.net.Socket.connect(Socket.java:538)
>>
>>         at 
>> org.apache.nifi.BootstrapListener.sendCommand(BootstrapListener.java:
>> 100)
>>
>>         at 
>> org.apache.nifi.BootstrapListener.start(BootstrapListener.java:83)
>>
>>         at org.apache.nifi.NiFi.<init>(NiFi.java:102)
>>
>>         at org.apache.nifi.NiFi.<init>(NiFi.java:71)
>>
>>         at org.apache.nifi.NiFi.main(NiFi.java:292)
>>
>> 2018-10-10 13:59:15,058 INFO [Thread-1] org.apache.nifi.NiFi Initiating 
>> shutdown of Jetty web server...
>>
>> 2018-10-10 13:59:15,059 INFO [Thread-1] org.apache.nifi.NiFi Jetty web 
>> server shutdown completed (nicely or otherwise).
>>
>>
>>
>> Without clustering, the instances had no problem starting. Since this is our 
>> first experiment building a cluster, I’m not sure where to look for clues.
>>
>>
>>
>> Thanks in advance,
>>
>>
>>
>> Alexander

RE: NiFi fails on cluster nodes

Reply via email to