Hi Mark In version 1.13.2 (at least) the file "main/nifi-commons/nifi-properties/src/main/java/org/apache/nifi/util/NiFiProperties.java" is looking for a property called "nifi.cluster.load.balance.address" which has been reported in https://issues.apache.org/jira/browse/NIFI-8643 and fixed in version 1.14.0
In version 1.14.0 the only way I can get it to work, is if I type in the IP address. If I don't specified it or type in the fqdn the load balance port will bind to localhost. which has been reported in https://issues.apache.org/jira/browse/NIFI-9010 The result from running netstat -l tcp 0 0 localhost:6342 0.0.0.0:* LISTEN Kind regards Jens M. Kofoed Den tor. 5. aug. 2021 kl. 23.08 skrev Mark Payne <[email protected]>: > Axel, > > I think that I can help clarify some of these things. > > First of all: nifi.cluster.load.balance.host vs. > nifi.cluster.load.balance.address > * The nifi.cluster.load.balance.host property is what matters. > > * The nifi.cluster.load.balance.address is not a real property. NiFi has > never looked at this property. However, in the first release that included > load-balancing, there was a typo in which the nifi.properties file had > “…address” instead of “…host”. This was later addressed. > > * So if you have a value for “nifi.cluster.load.balance.address”, it does > nothing and is always ignored. > > > > Next: nifi.cluster.load.balance.host property > > * nifi.cluster.load.balance.host can be either an IP address or a > hostname. But if set, other nodes in the cluster MUST be able to > communicate with the node using whatever value you put here. So using a > value of 0.0.0.0 will not work. Also, if set, NiFi will listen for incoming > connections ONLY on that hostname. So if you set it to “localhost”, for > instance, no other node can connect to it, because no other host can > connect to the node using “localhost”. So this needs to be an address that > both the NiFi instance knows about/can bind to, and other nodes in the > cluster can connect to. > > * If nifi.cluster.load.balance.host is NOT set: NiFi will listen for > incoming requests on all network interfaces / hostnames. It will advertise > its hostname to other nodes in the cluster according to whatever is set for > the “nifi.cluster.node.address” property. Meaning that other nodes in the > cluster must be able to connect to this node using whatever hostname is set > for the “nifi.cluster.node.address” property. If > the “nifi.cluster.node.address” property is not set, it advertises its > hostname as localhost - which means other nodes won’t be able to send to > it. > > So you must specify either the “nifi.cluster.load.balance.host” property > or the “nifi.cluster.node.address” property. > > > > Finally: having to delete the state directory > > If you change the “nifi.cluster.load.balance.host” or > “nifi.cluster.load.balance.port” property and restart a node, you must > restart all nodes in the cluster. Otherwise, the other nodes won’t be able > to send to that node. > So, for example, when you changed the load.balance.host from fqdn or > 0.0.0.0 to the IP address - the other nodes in the cluster would stop > sending. I created a JIRA [1] for that. In my testing, when I changed the > hostname, the other nodes stopped sending. But restarting them got things > back on track. I wasn’t able to replicate the issue after restarting all > nodes. > > Hope this is helpful! > -Mark > > [1] https://issues.apache.org/jira/browse/NIFI-9017 > > > On Aug 3, 2021, at 3:08 AM, Axel Schwarz <[email protected]> wrote: > > Hey guys, > > I think I found the "trick" for at least version 1.13.2 and of course I'll > share it with you. > I now use the following load balancing properties: > > # cluster load balancing properties # > nifi.cluster.load.balance.host=192.168.1.10 > nifi.cluster.load.balance.port=6342 > nifi.cluster.load.balance.connections.per.node=4 > nifi.cluster.load.balance.max.thread.count=8 > nifi.cluster.load.balance.comms.timeout=30 sec > > So I use the hosts IP address for balance.host instead of 0.0.0.0 or the > fqdn and have no balance.address property at all. > This led to partly load balancing in my case as already mentioned. It > looked like I needed to do one more step to reach the goal and this step > seems to be deleting all statemanagement files. > > Through the state-management.xml config file I changed the state > management directory to be outside of the nifi installation, because the > config file says "it is important, that the directory be copied over to the > new version when upgrading nifi". So everytime when I upgraded or > reinstalled Nifi during my load balancing odyssey, the statemanagement > remained completely untouched. > As soon as I changed that, by deleting the entire state management > directory before reinstalling Nifi with above mentioned properties, load > balancing was immediately working throughout the whole cluster. > > > I think for my flow it is not quite that bad to delete the state > management as I only use one statefull processor to increase some counter. > And the times I already tried this by now, I could not encounter any wrong > behaviour whatsoever. But of course I can't test everything, so when any of > you have some important facts about deleting the state management, please > let me know :) > > Beside that I now feel like this solved my problem. Gotta have an eye on > that when updating to version 1.14.0 later on, but I think I can figure > this out. So thanks for all your support! :) > > --- Ursprüngliche Nachricht --- > Von: "Jens M. Kofoed" <[email protected]> > Datum: 29.07.2021 11:08:28 > An: [email protected], Axel Schwarz <[email protected]> > Betreff: Re: Re: Re: No Load Balancing since 1.13.2 > > Hmm... I can't remember :-( sorry > > My configuration for version 1.13.2 is like this: > # cluster node properties (only configure for cluster nodes) # > nifi.cluster.is.node=true > nifi.cluster.node.address=nifi-node01.domaine.com > nifi.cluster.node.protocol.port=9443 > nifi.cluster.node.protocol.threads=10 > nifi.cluster.node.protocol.max.threads=50 > nifi.cluster.node.event.history.size=25 > nifi.cluster.node.connection.timeout=5 sec > nifi.cluster.node.read.timeout=5 sec > nifi.cluster.node.max.concurrent.requests=100 > nifi.cluster.firewall.file= > nifi.cluster.flow.election.max.wait.time=5 mins > nifi.cluster.flow.election.max.candidates=3 > > # cluster load balancing properties # > nifi.cluster.load.balance.address=192.168.1.11 > nifi.cluster.load.balance.port=6111 > nifi.cluster.load.balance.connections.per.node=4 > nifi.cluster.load.balance.max.thread.count=8 > nifi.cluster.load.balance.comms.timeout=30 sec > > So I defined "nifi.cluster.node.address" with the hostname and > not an ip > adress and the "nifi.cluster.load.balance.address" with the ip > address of > the server. > And triple check the configuration at all servers :-) > > Kind Regards > Jens M. Kofoed > > > Den tor. 29. jul. 2021 kl. 10.11 skrev Axel Schwarz <[email protected] > >: > > > Hey Jens, > > in Issue Nifi-8643 you wrote the last comment with the exactly same > > > behaviour as we're experiencing now. 2 of 3 nodes were load balancing. > > > How did you get the third node to participate in load balancing? An > > update > > to 1.14.0 does not change anything for us. > > > > https://issues.apache.org/jira/browse/NIFI-8643?focusedCommentId=17361418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17361418 > > > > > --- Ursprüngliche Nachricht --- > Von: "Jens M. Kofoed" <[email protected]> > Datum: 28.07.2021 12:07:50 > An: [email protected], Axel Schwarz <[email protected]> > > > Betreff: Re: Re: No Load Balancing since 1.13.2 > > hi > > I can see that you have configured > > nifi.cluster.load.balance.address=0.0.0.0 > > > Have your tried to set the correct ip adress? > node1: nifi.cluster.load.balance.address=192.168.1.10 > node2: nifi.cluster.load.balance.address=192.168.1.11 > node3: nifi.cluster.load.balance.address=192.168.1.12 > > regards > Jens M. Kofoed > > Den ons. 28. jul. 2021 kl. 11.17 skrev Axel Schwarz < > > [email protected]>: > > > > Just tried Java 11. But still does not work. Nothing changed. > > :( > > > --- Ursprüngliche Nachricht --- > Von: Jorge Machado <[email protected]> > Datum: 27.07.2021 13:08:55 > An: [email protected], Axel Schwarz <[email protected]> > > > > Betreff: Re: No Load Balancing since 1.13.2 > > Did you tried java 11 ? I have a client running a similar > > setup > > to yours > > but with a lower nigh version and it works fine. Maybe > > it is worth > > to try > > it. > > > On 27. Jul 2021, at 12:42, Axel Schwarz <[email protected]> > > > > wrote: > > > I did indeed, but I updated from u161 to u291, as > > this was > > the newest > > version at that time, because I thought it could help. > > > So the issue started under u161. But I just saw > > that u301 > > is out. I > > will try this as well. > > --- Ursprüngliche Nachricht --- > Von: Pierre Villard <[email protected]> > > > Datum: 27.07.2021 10:18:38 > An: [email protected], Axel Schwarz <[email protected]> > > > > > Betreff: Re: No Load Balancing since 1.13.2 > > Hi, > > I believe the minor u291 is known to have issues > > (for some > > of its early > > builds). Did you upgrade the Java version recently? > > > Thanks, > Pierre > > Le mar. 27 juil. 2021 à 08:07, Axel Schwarz <[email protected] > > > > <mailto:[email protected] <[email protected]>>> a écrit : > > Dear Community, > > we're running a secured 3 node Nifi Cluster on Java > > 8_u291 > > and Debian > > 7 and experiencing > > problems with load balancing since version 1.13.2. > > > > I'm fully aware of Issue Nifi-8643 and tested alot > > around > > this, but > > gotta say, that this > > is not our problem. Mainly because the balance port > > never > > binds to > > localhost, > > but also because I > > implemented all workarounds under version 1.13.2 > > and even > > tried version > > 1.14.0 by now, > > but load blancing still does not work. > What we experience is best described as "the > > primary > > node balances > > with itself"... > > > So what it does is, opening the balancing connections > > to its > > own IP > > instead of the IPs > > of the other two nodes. And the other two nodes > > don't open > > balancing > > connections at all. > > > When executing "ss | grep 6342" on the > > primary node, > > this > > is what it looks like: > > > [root@nifiHost1 conf]# ss | grep 6342 > tcp ESTAB 0 0 192.168.1.10:51380 > > < > > http://192.168.1.10:51380/> > > 192.168.1.10:6342 <http://192.168.1.10:6342/> > > > > > tcp ESTAB 0 0 192.168.1.10:51376 > > < > > http://192.168.1.10:51376/> > > 192.168.1.10:6342 <http://192.168.1.10:6342/> > > > > > tcp ESTAB 0 0 192.168.1.10:51378 > > < > > http://192.168.1.10:51378/> > > 192.168.1.10:6342 <http://192.168.1.10:6342/> > > > > > tcp ESTAB 0 0 192.168.1.10:51370 > > < > > http://192.168.1.10:51370/> > > 192.168.1.10:6342 <http://192.168.1.10:6342/> > > > > > tcp ESTAB 0 0 192.168.1.10:51372 > > < > > http://192.168.1.10:51372/> > > 192.168.1.10:6342 <http://192.168.1.10:6342/> > > > > > tcp ESTAB 0 0 192.168.1.10:6342 > > < > > http://192.168.1.10:6342/> > > 192.168.1.10:51376 <http://192.168.1.10:51376/> > > > > > tcp ESTAB 0 0 192.168.1.10:51374 > > < > > http://192.168.1.10:51374/> > > 192.168.1.10:6342 <http://192.168.1.10:6342/> > > > > > tcp ESTAB 0 0 192.168.1.10:6342 > > < > > http://192.168.1.10:6342/> > > 192.168.1.10:51374 <http://192.168.1.10:51374/> > > > > > tcp ESTAB 0 0 192.168.1.10:51366 > > < > > http://192.168.1.10:51366/> > > 192.168.1.10:6342 <http://192.168.1.10:6342/> > > > > > tcp ESTAB 0 0 192.168.1.10:6342 > > < > > http://192.168.1.10:6342/> > > 192.168.1.10:51370 <http://192.168.1.10:51370/> > > > > > tcp ESTAB 0 0 192.168.1.10:6342 > > < > > http://192.168.1.10:6342/> > > 192.168.1.10:51366 <http://192.168.1.10:51366/> > > > > > tcp ESTAB 0 0 192.168.1.10:51368 > > < > > http://192.168.1.10:51368/> > > 192.168.1.10:6342 <http://192.168.1.10:6342/> > > > > > tcp ESTAB 0 0 192.168.1.10:6342 > > < > > http://192.168.1.10:6342/> > > 192.168.1.10:51372 <http://192.168.1.10:51372/> > > > > > tcp ESTAB 0 0 192.168.1.10:6342 > > < > > http://192.168.1.10:6342/> > > 192.168.1.10:51378 <http://192.168.1.10:51378/> > > > > > tcp ESTAB 0 0 192.168.1.10:6342 > > < > > http://192.168.1.10:6342/> > > 192.168.1.10:51368 <http://192.168.1.10:51368/> > > > > > tcp ESTAB 0 0 192.168.1.10:6342 > > < > > http://192.168.1.10:6342/> > > 192.168.1.10:51380 <http://192.168.1.10:51380/> > > > > > Executing it on the other non primary nodes, just > > returns > > absolutely > > nothing. > > > Netstat show the following on each server: > > [root@nifiHost1 conf]# netstat -tulpn > Active Internet connections (only servers) > Proto Recv-Q Send-Q Local Address Foreign > > Address > > > State PID/Program name > > tcp 0 0 192.168.1.10:6342 <http://192.168.1.10:6342/> > > > > 0.0.0.0:* LISTEN 10352/java > > > > [root@nifiHost2 conf]# netstat -tulpn > Active Internet connections (only servers) > Proto Recv-Q Send-Q Local Address Foreign > > Address > > > State PID/Program name > > tcp 0 0 192.168.1.11:6342 <http://192.168.1.11:6342/> > > > > 0.0.0.0:* LISTEN 31562/java > > > > [root@nifiHost3 conf]# netstat -tulpn > Active Internet connections (only servers) > Proto Recv-Q Send-Q Local Address Foreign > > Address > > > State PID/Program name > > tcp 0 0 192.168.1.12:6342 <http://192.168.1.12:6342/> > > > > 0.0.0.0:* LISTEN 31685/java > > > > And here is what our load balancing properties look > > like: > > > > # cluster load balancing properties # > nifi.cluster.load.balance.host=nifiHost1.contoso.com > > < > > > http://nifihost1.contoso.com/> > > > nifi.cluster.load.balance.address=0.0.0.0 > nifi.cluster.load.balance.port=6342 > nifi.cluster.load.balance.connections.per.node=4 > > > nifi.cluster.load.balance.max.thread.count=8 > nifi.cluster.load.balance.comms.timeout=30 sec > > When running Nifi in version 1.12.1 on the exact > > same setup > > in the > > exact > > same environment, load balancing is working absolutely > > fine. > > There was a time when load balancing even worked > > in version > > 1.13.2. > > But I'm not able to reproduce this and it just stopped > > > working one day after some restart, without changing > > any property > > or > > whatsoever. > > > If any more information would be helpful please > > let me know > > and I'll > > try to provide it as fast as possible. > > > > > Versendet mit Emailn.de <https://www.emailn.de/> > > - Freemail > > > > * Unbegrenzt Speicherplatz > * Eigenes Online-Büro > * 24h besten Mailempfang > * Spamschutz, Adressbuch > > > > > Versendet mit Emailn.de <https://www.emailn.de/> > > - Freemail > > > > > * Unbegrenzt Speicherplatz > * Eigenes Online-Büro > * 24h besten Mailempfang > * Spamschutz, Adressbuch > > > > > > > > > > > > > > >
