Hi Isha,

Thanks for providing some background on the configuration and related
issues. Based on the issues you highlighted, it sounds like you are running
into several known problems.  There are some potential workarounds, but
refactoring the flow configuration to use standard connection load
balancing is the best solution. Upgrading to NiFi 1.15.3 addresses a number
of security and performance issues, including some of the items you
mentioned.

Related to the first problem, using RAW socket communication should be
preferred for RPG communication.  RAW socket communication is not subject
to the Denial-of-Service filter timeout, and also has less overhead than
HTTP request processing.  Ensuring that all Remote Process Groups use RAW
socket communication should help.  When HTTP requests exceed the DoS filter
timeout, Jetty terminates the connection, which can produce any number of
errors, such as the End-of-File and Connection Closed issues you have
observed.

Using HTTP communication also uses threads from the Jetty server, which can
impact user interface performance. This might also be part of the
explanation for cluster nodes getting out of sync, but there could be other
factors involved.

NiFi 1.12.1 includes several issues related to the Denial-of-Service filter
and Site-to-Site communication, which have been addressed in more recent
releases.  Here are a couple worth noting:

- https://issues.apache.org/jira/browse/NIFI-7912 Added new
nifi.web.request properties that can be used to change the default 30
second timeout and exclude IP addresses from filtering
- https://issues.apache.org/jira/browse/NIFI-9448 Resolved potential
IllegalStateException for S2S client communication
- https://issues.apache.org/jira/browse/NIFI-9481 Exclude HTTP Site-to-Site
Communication from DoS Filter

The last issue is not yet part of a released version, but the other two are
resolved in NiFi 1.15.3.

Although upgrading and migrating to connection load balancing will take
some work, it is the best path forward to address the issues you observed.

Regards,
David Handermann

On Wed, Feb 23, 2022 at 11:55 AM Isha Lamboo <[email protected]>
wrote:

> Hi all,
>
>
>
> I’m hoping to get some perspective from people that have NiFi with a large
> number of Remote Process Groups.
>
>
>
> I’m supporting a NiFi 1.12.1 (yes, I know) cluster of 3 nodes that has
> about 5k processors and load-balancing still done the pre-1.8 way, with
> RPGs looping back to the local cluster. There are 500+ RPGs with only about
> 30 actually going to other NiFi clusters.
>
>
>
> We’re having several problems:
>
>    - input ports getting stuck when the RPG is set to HTTP protocol and
>    connections get killed  by the Jetty DoS filter after 30 secs. The standard
>    is RAW, but sometimes a HTTP RPG still gets deployed.
>    - Intermittent errors like EoF, connection closed etc on HTTP
>    connections
>    - The cluster being unable to sync changes made to the flow resulting
>    in disconnected nodes and sometimes uninheritable flow exceptions.
>
>
>
> My idea is that the RPGs should be replaced by load-balanced connection
> and/or local ports, but developer resources are scarce, so I want to either
> make a business case or tune NiFi performance if 500 RPGs should not cause
> problems normally.
>
>
>
> So is this a known issue or particular to my case? How can I
> identify/solve performance bottlenecks with RPGs?
>
>
>
> Kind regards,
>
>
>
> Isha Lamboo
>
>
>

Reply via email to