Hi Isha, Thanks for providing some background on the configuration and related issues. Based on the issues you highlighted, it sounds like you are running into several known problems. There are some potential workarounds, but refactoring the flow configuration to use standard connection load balancing is the best solution. Upgrading to NiFi 1.15.3 addresses a number of security and performance issues, including some of the items you mentioned.
Related to the first problem, using RAW socket communication should be preferred for RPG communication. RAW socket communication is not subject to the Denial-of-Service filter timeout, and also has less overhead than HTTP request processing. Ensuring that all Remote Process Groups use RAW socket communication should help. When HTTP requests exceed the DoS filter timeout, Jetty terminates the connection, which can produce any number of errors, such as the End-of-File and Connection Closed issues you have observed. Using HTTP communication also uses threads from the Jetty server, which can impact user interface performance. This might also be part of the explanation for cluster nodes getting out of sync, but there could be other factors involved. NiFi 1.12.1 includes several issues related to the Denial-of-Service filter and Site-to-Site communication, which have been addressed in more recent releases. Here are a couple worth noting: - https://issues.apache.org/jira/browse/NIFI-7912 Added new nifi.web.request properties that can be used to change the default 30 second timeout and exclude IP addresses from filtering - https://issues.apache.org/jira/browse/NIFI-9448 Resolved potential IllegalStateException for S2S client communication - https://issues.apache.org/jira/browse/NIFI-9481 Exclude HTTP Site-to-Site Communication from DoS Filter The last issue is not yet part of a released version, but the other two are resolved in NiFi 1.15.3. Although upgrading and migrating to connection load balancing will take some work, it is the best path forward to address the issues you observed. Regards, David Handermann On Wed, Feb 23, 2022 at 11:55 AM Isha Lamboo <[email protected]> wrote: > Hi all, > > > > I’m hoping to get some perspective from people that have NiFi with a large > number of Remote Process Groups. > > > > I’m supporting a NiFi 1.12.1 (yes, I know) cluster of 3 nodes that has > about 5k processors and load-balancing still done the pre-1.8 way, with > RPGs looping back to the local cluster. There are 500+ RPGs with only about > 30 actually going to other NiFi clusters. > > > > We’re having several problems: > > - input ports getting stuck when the RPG is set to HTTP protocol and > connections get killed by the Jetty DoS filter after 30 secs. The standard > is RAW, but sometimes a HTTP RPG still gets deployed. > - Intermittent errors like EoF, connection closed etc on HTTP > connections > - The cluster being unable to sync changes made to the flow resulting > in disconnected nodes and sometimes uninheritable flow exceptions. > > > > My idea is that the RPGs should be replaced by load-balanced connection > and/or local ports, but developer resources are scarce, so I want to either > make a business case or tune NiFi performance if 500 RPGs should not cause > problems normally. > > > > So is this a known issue or particular to my case? How can I > identify/solve performance bottlenecks with RPGs? > > > > Kind regards, > > > > Isha Lamboo > > >
