Re: No Load Balancing since 1.13.2

Jens M. Kofoed Thu, 05 Aug 2021 21:28:57 -0700

Hi Mark

In version 1.13.2 (at least) the file
"main/nifi-commons/nifi-properties/src/main/java/org/apache/nifi/util/NiFiProperties.java"
is looking for a property called "nifi.cluster.load.balance.address" which
has been reported in https://issues.apache.org/jira/browse/NIFI-8643 and
fixed in version 1.14.0


In version 1.14.0 the only way I can get it to work, is if I type in the IP
address. If I don't specified it or type in the fqdn the load balance port
will bind to localhost. which has been reported in
https://issues.apache.org/jira/browse/NIFI-9010
The result from running netstat -l
tcp 0 0 localhost:6342 0.0.0.0:* LISTEN

Kind regards
Jens M. Kofoed



Den tor. 5. aug. 2021 kl. 23.08 skrev Mark Payne <[email protected]>:

> Axel,
>
> I think that I can help clarify some of these things.
>
> First of all: nifi.cluster.load.balance.host vs.
> nifi.cluster.load.balance.address
> * The nifi.cluster.load.balance.host property is what matters.
>
> * The nifi.cluster.load.balance.address is not a real property. NiFi has
> never looked at this property. However, in the first release that included
> load-balancing, there was a typo in which the nifi.properties file had
> “…address” instead of “…host”. This was later addressed.
>
> * So if you have a value for “nifi.cluster.load.balance.address”, it does
> nothing and is always ignored.
>
>
>
> Next: nifi.cluster.load.balance.host property
>
> * nifi.cluster.load.balance.host can be either an IP address or a
> hostname. But if set, other nodes in the cluster MUST be able to
> communicate with the node using whatever value you put here. So using a
> value of 0.0.0.0 will not work. Also, if set, NiFi will listen for incoming
> connections ONLY on that hostname. So if you set it to “localhost”, for
> instance, no other node can connect to it, because no other host can
> connect to the node using “localhost”. So this needs to be an address that
> both the NiFi instance knows about/can bind to, and other nodes in the
> cluster can connect to.
>
> * If nifi.cluster.load.balance.host is NOT set: NiFi will listen for
> incoming requests on all network interfaces / hostnames. It will advertise
> its hostname to other nodes in the cluster according to whatever is set for
> the “nifi.cluster.node.address” property. Meaning that other nodes in the
> cluster must be able to connect to this node using whatever hostname is set
> for the “nifi.cluster.node.address” property. If
> the “nifi.cluster.node.address” property is not set, it advertises its
> hostname as localhost - which means other nodes won’t be able to send to
> it.
>
> So you must specify either the “nifi.cluster.load.balance.host” property
> or the “nifi.cluster.node.address” property.
>
>
>
> Finally: having to delete the state directory
>
> If you change the “nifi.cluster.load.balance.host” or
> “nifi.cluster.load.balance.port” property and restart a node, you must
> restart all nodes in the cluster. Otherwise, the other nodes won’t be able
> to send to that node.
> So, for example, when you changed the load.balance.host from fqdn or
> 0.0.0.0 to the IP address - the other nodes in the cluster would stop
> sending. I created a JIRA [1] for that. In my testing, when I changed the
> hostname, the other nodes stopped sending. But restarting them got things
> back on track. I wasn’t able to replicate the issue after restarting all
> nodes.
>
> Hope this is helpful!
> -Mark
>
> [1] https://issues.apache.org/jira/browse/NIFI-9017
>
>
> On Aug 3, 2021, at 3:08 AM, Axel Schwarz <[email protected]> wrote:
>
> Hey guys,
>
> I think I found the "trick" for at least version 1.13.2 and of course I'll
> share it with you.
> I now use the following load balancing properties:
>
> # cluster load balancing properties #
> nifi.cluster.load.balance.host=192.168.1.10
> nifi.cluster.load.balance.port=6342
> nifi.cluster.load.balance.connections.per.node=4
> nifi.cluster.load.balance.max.thread.count=8
> nifi.cluster.load.balance.comms.timeout=30 sec
>
> So I use the hosts IP address for balance.host instead of 0.0.0.0 or the
> fqdn and have no balance.address property at all.
> This led to partly load balancing in my case as already mentioned. It
> looked like I needed to do one more step to reach the goal and this step
> seems to be deleting all statemanagement files.
>
> Through the state-management.xml config file I changed the state
> management directory to be outside of the nifi installation, because the
> config file says "it is important, that the directory be copied over to the
> new version when upgrading nifi". So everytime when I upgraded or
> reinstalled Nifi during my load balancing odyssey, the statemanagement
> remained completely untouched.
> As soon as I changed that, by deleting the entire state management
> directory before reinstalling Nifi with above mentioned properties, load
> balancing was immediately working throughout the whole cluster.
>
>
> I think for my flow it is not quite that bad to delete the state
> management as I only use one statefull processor to increase some counter.
> And the times I already tried this by now, I could not encounter any wrong
> behaviour whatsoever. But of course I can't test everything, so when any of
> you have some important facts about deleting the state management, please
> let me know :)
>
> Beside that I now feel like this solved my problem. Gotta have an eye on
> that when updating to version 1.14.0 later on, but I think I can figure
> this out. So thanks for all your support! :)
>
> --- Ursprüngliche Nachricht ---
> Von: "Jens M. Kofoed" <[email protected]>
> Datum: 29.07.2021 11:08:28
> An: [email protected], Axel Schwarz <[email protected]>
> Betreff: Re: Re: Re: No Load Balancing since 1.13.2
>
> Hmm... I can't remember :-( sorry
>
> My configuration for version 1.13.2 is like this:
> # cluster node properties (only configure for cluster nodes) #
> nifi.cluster.is.node=true
> nifi.cluster.node.address=nifi-node01.domaine.com
> nifi.cluster.node.protocol.port=9443
> nifi.cluster.node.protocol.threads=10
> nifi.cluster.node.protocol.max.threads=50
> nifi.cluster.node.event.history.size=25
> nifi.cluster.node.connection.timeout=5 sec
> nifi.cluster.node.read.timeout=5 sec
> nifi.cluster.node.max.concurrent.requests=100
> nifi.cluster.firewall.file=
> nifi.cluster.flow.election.max.wait.time=5 mins
> nifi.cluster.flow.election.max.candidates=3
>
> # cluster load balancing properties #
> nifi.cluster.load.balance.address=192.168.1.11
> nifi.cluster.load.balance.port=6111
> nifi.cluster.load.balance.connections.per.node=4
> nifi.cluster.load.balance.max.thread.count=8
> nifi.cluster.load.balance.comms.timeout=30 sec
>
> So I defined "nifi.cluster.node.address" with the hostname and
> not an ip
> adress and the "nifi.cluster.load.balance.address" with the ip
> address of
> the server.
> And triple check the configuration at all servers :-)
>
> Kind Regards
> Jens M. Kofoed
>
>
> Den tor. 29. jul. 2021 kl. 10.11 skrev Axel Schwarz <[email protected]
> >:
>
>
> Hey Jens,
>
> in Issue Nifi-8643 you wrote the last comment with the exactly same
>
>
> behaviour as we're experiencing now. 2 of 3 nodes were load balancing.
>
>
> How did you get the third node to participate in load balancing? An
>
> update
>
> to 1.14.0 does not change anything for us.
>
>
>
> https://issues.apache.org/jira/browse/NIFI-8643?focusedCommentId=17361418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17361418
>
>
>
>
> --- Ursprüngliche Nachricht ---
> Von: "Jens M. Kofoed" <[email protected]>
> Datum: 28.07.2021 12:07:50
> An: [email protected], Axel Schwarz <[email protected]>
>
>
> Betreff: Re: Re: No Load Balancing since 1.13.2
>
> hi
>
> I can see that you have configured
>
> nifi.cluster.load.balance.address=0.0.0.0
>
>
> Have your tried to set the correct ip adress?
> node1: nifi.cluster.load.balance.address=192.168.1.10
> node2: nifi.cluster.load.balance.address=192.168.1.11
> node3: nifi.cluster.load.balance.address=192.168.1.12
>
> regards
> Jens M. Kofoed
>
> Den ons. 28. jul. 2021 kl. 11.17 skrev Axel Schwarz <
>
> [email protected]>:
>
>
>
> Just tried Java 11. But still does not work. Nothing changed.
>
> :(
>
>
> --- Ursprüngliche Nachricht ---
> Von: Jorge Machado <[email protected]>
> Datum: 27.07.2021 13:08:55
> An: [email protected],  Axel Schwarz <[email protected]>
>
>
>
> Betreff: Re: No Load Balancing since 1.13.2
>
> Did you tried java 11 ? I have a client running a similar
>
> setup
>
> to yours
>
> but with a lower nigh version and it works fine. Maybe
>
> it is worth
>
> to try
>
> it.
>
>
> On 27. Jul 2021, at 12:42, Axel Schwarz <[email protected]>
>
>
>
> wrote:
>
>
> I did indeed, but I updated from u161 to u291, as
>
> this was
>
> the newest
>
> version at that time, because I thought it could help.
>
>
> So the issue started under u161. But I just saw
>
> that u301
>
> is out. I
>
> will try this as well.
>
> --- Ursprüngliche Nachricht ---
> Von: Pierre Villard <[email protected]>
>
>
> Datum: 27.07.2021 10:18:38
> An: [email protected], Axel Schwarz <[email protected]>
>
>
>
>
> Betreff: Re: No Load Balancing since 1.13.2
>
> Hi,
>
> I believe the minor u291 is known to have issues
>
> (for some
>
> of its early
>
> builds). Did you upgrade the Java version recently?
>
>
> Thanks,
> Pierre
>
> Le mar. 27 juil. 2021 à 08:07, Axel Schwarz <[email protected]
>
>
>
> <mailto:[email protected] <[email protected]>>> a écrit :
>
> Dear Community,
>
> we're running a secured 3 node Nifi Cluster on Java
>
> 8_u291
>
> and Debian
>
> 7 and experiencing
>
> problems with load balancing since version 1.13.2.
>
>
>
> I'm fully aware of Issue Nifi-8643 and tested alot
>
> around
>
> this, but
>
> gotta say, that this
>
> is not our problem. Mainly because the balance port
>
> never
>
> binds to
>
> localhost,
>
> but also because I
>
> implemented all workarounds under version 1.13.2
>
> and even
>
> tried version
>
> 1.14.0 by now,
>
> but load blancing still does not work.
> What we experience is best described as "the
>
> primary
>
> node balances
>
> with itself"...
>
>
> So what it does is, opening the balancing connections
>
> to its
>
> own IP
>
> instead of the IPs
>
> of the other two nodes. And the other two nodes
>
> don't open
>
> balancing
>
> connections at all.
>
>
> When executing "ss | grep 6342" on the
>
> primary node,
>
> this
>
> is what it looks like:
>
>
> [root@nifiHost1 conf]# ss | grep 6342
> tcp    ESTAB      0      0      192.168.1.10:51380
>
> <
>
> http://192.168.1.10:51380/>
>
>               192.168.1.10:6342 <http://192.168.1.10:6342/>
>
>
>
>
> tcp    ESTAB      0      0      192.168.1.10:51376
>
> <
>
> http://192.168.1.10:51376/>
>
>               192.168.1.10:6342 <http://192.168.1.10:6342/>
>
>
>
>
> tcp    ESTAB      0      0      192.168.1.10:51378
>
> <
>
> http://192.168.1.10:51378/>
>
>               192.168.1.10:6342 <http://192.168.1.10:6342/>
>
>
>
>
> tcp    ESTAB      0      0      192.168.1.10:51370
>
> <
>
> http://192.168.1.10:51370/>
>
>               192.168.1.10:6342 <http://192.168.1.10:6342/>
>
>
>
>
> tcp    ESTAB      0      0      192.168.1.10:51372
>
> <
>
> http://192.168.1.10:51372/>
>
>               192.168.1.10:6342 <http://192.168.1.10:6342/>
>
>
>
>
> tcp    ESTAB      0      0      192.168.1.10:6342
>
> <
>
> http://192.168.1.10:6342/>
>
>                192.168.1.10:51376 <http://192.168.1.10:51376/>
>
>
>
>
> tcp    ESTAB      0      0      192.168.1.10:51374
>
> <
>
> http://192.168.1.10:51374/>
>
>               192.168.1.10:6342 <http://192.168.1.10:6342/>
>
>
>
>
> tcp    ESTAB      0      0      192.168.1.10:6342
>
> <
>
> http://192.168.1.10:6342/>
>
>                192.168.1.10:51374 <http://192.168.1.10:51374/>
>
>
>
>
> tcp    ESTAB      0      0      192.168.1.10:51366
>
> <
>
> http://192.168.1.10:51366/>
>
>               192.168.1.10:6342 <http://192.168.1.10:6342/>
>
>
>
>
> tcp    ESTAB      0      0      192.168.1.10:6342
>
> <
>
> http://192.168.1.10:6342/>
>
>                192.168.1.10:51370 <http://192.168.1.10:51370/>
>
>
>
>
> tcp    ESTAB      0      0      192.168.1.10:6342
>
> <
>
> http://192.168.1.10:6342/>
>
>                192.168.1.10:51366 <http://192.168.1.10:51366/>
>
>
>
>
> tcp    ESTAB      0      0      192.168.1.10:51368
>
> <
>
> http://192.168.1.10:51368/>
>
>               192.168.1.10:6342 <http://192.168.1.10:6342/>
>
>
>
>
> tcp    ESTAB      0      0      192.168.1.10:6342
>
> <
>
> http://192.168.1.10:6342/>
>
>                192.168.1.10:51372 <http://192.168.1.10:51372/>
>
>
>
>
> tcp    ESTAB      0      0      192.168.1.10:6342
>
> <
>
> http://192.168.1.10:6342/>
>
>                192.168.1.10:51378 <http://192.168.1.10:51378/>
>
>
>
>
> tcp    ESTAB      0      0      192.168.1.10:6342
>
> <
>
> http://192.168.1.10:6342/>
>
>                192.168.1.10:51368 <http://192.168.1.10:51368/>
>
>
>
>
> tcp    ESTAB      0      0      192.168.1.10:6342
>
> <
>
> http://192.168.1.10:6342/>
>
>                192.168.1.10:51380 <http://192.168.1.10:51380/>
>
>
>
>
> Executing it on the other non primary nodes, just
>
> returns
>
> absolutely
>
> nothing.
>
>
> Netstat show the following on each server:
>
> [root@nifiHost1 conf]# netstat -tulpn
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address           Foreign
>
> Address
>
>
> State       PID/Program name
>
> tcp        0      0 192.168.1.10:6342 <http://192.168.1.10:6342/>
>
>
>
>         0.0.0.0:*               LISTEN      10352/java
>
>
>
> [root@nifiHost2 conf]# netstat -tulpn
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address           Foreign
>
> Address
>
>
> State       PID/Program name
>
> tcp        0      0 192.168.1.11:6342 <http://192.168.1.11:6342/>
>
>
>
>         0.0.0.0:*               LISTEN      31562/java
>
>
>
> [root@nifiHost3 conf]# netstat -tulpn
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address           Foreign
>
> Address
>
>
> State       PID/Program name
>
> tcp        0      0 192.168.1.12:6342 <http://192.168.1.12:6342/>
>
>
>
>         0.0.0.0:*               LISTEN      31685/java
>
>
>
> And here is what our load balancing properties look
>
> like:
>
>
>
> # cluster load balancing properties #
> nifi.cluster.load.balance.host=nifiHost1.contoso.com
>
> <
>
>
> http://nifihost1.contoso.com/>
>
>
> nifi.cluster.load.balance.address=0.0.0.0
> nifi.cluster.load.balance.port=6342
> nifi.cluster.load.balance.connections.per.node=4
>
>
> nifi.cluster.load.balance.max.thread.count=8
> nifi.cluster.load.balance.comms.timeout=30 sec
>
> When running Nifi in version 1.12.1 on the exact
>
> same setup
>
> in the
>
> exact
>
> same environment, load balancing is working absolutely
>
> fine.
>
> There was a time when load balancing even worked
>
> in version
>
> 1.13.2.
>
> But I'm not able to reproduce this and it just stopped
>
>
> working one day after some restart, without changing
>
> any property
>
> or
>
> whatsoever.
>
>
> If any more information would be helpful please
>
> let me know
>
> and I'll
>
> try to provide it as fast as possible.
>
>
>
>
> Versendet mit Emailn.de <https://www.emailn.de/>
>
> - Freemail
>
>
>
> * Unbegrenzt Speicherplatz
> * Eigenes Online-Büro
> * 24h besten Mailempfang
> * Spamschutz, Adressbuch
>
>
>
>
> Versendet mit Emailn.de <https://www.emailn.de/>
>
> - Freemail
>
>
>
>
> * Unbegrenzt Speicherplatz
> * Eigenes Online-Büro
> * 24h besten Mailempfang
> * Spamschutz, Adressbuch
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: No Load Balancing since 1.13.2

Reply via email to