Re: NiFi 1.6.0 cluster stability with Site-to-Site

Joe Gresock Sat, 11 Aug 2018 03:13:14 -0700

Good question Mark.   To clarify, another nifi cluster has an RPG on their
canvas (I believe they have 1.6.0 as well) with our cluster's URL, while we
have a Root input port.  What I meant by "when we add receiving
Site-to-Site traffic to the mix", I just mean when they start sending to
our existing port.


To answer someone's earlier question, we're running a 7-node cluster on
CentOS 6.9 on VMs with 48 GB RAM and 8 vCPUs.

I'm wondering if I just need to reduce the thread counts around my
cluster.  We had recently added several PutElasticsearchHttp processors to
store data in a new cluster, so it's possible this pushed our VMs over the
edge of what they can handle.  The reason I focused on Site-to-Site in my
question is because the receiving of traffic really seemed to be a heavy
factor in whether the cluster was stable.

Also, I'm going to check my logs for any administrative yielding, based on
NIFI-5075 mentioned above.

On Fri, Aug 10, 2018 at 5:27 PM Mark Payne <[email protected]> wrote:

> Joe G,
>
> Also, to clarify, when you say "when we add receiving Site-to-Site traffic
> to the mix, the CPU spikes to the point that the nodes can't talk to each
> other, resulting in the inability to view or modify the flow in the console"
> what exactly does "when we add receiving Site-to-stie traffic to the mix"
> mean? Does that mean adding an Input Port to your canvas' Root Group? Does
> it mean starting the Input Port? Or simply having the sender
> start transmitting data? Does it mean creating a Remote Process Group on
> your canvas? Trying to understand the exact action that is being taken here.
>
> The reason that I ask is that there was some refactoring of the component
> lifecycles in 1.6.0. That caused Funnels that were not fully connected to
> start using a huge amount of CPU.
> That was addressed in NIFI-5075 [1]. I'm wondering if perhaps you've
> stumbled across something similar, related to Root Group Ports or RPG
> Ports....
>
> Thanks
> -Mark
>
> [1] https://issues.apache.org/jira/browse/NIFI-5075
>
>
> On Aug 10, 2018, at 5:07 PM, Joe Witt <[email protected]> wrote:
>
> Yep what Mike points to is exactly what I was thinking of.  Since
> you're on 1.6.0 then probably the issue is something else.  1.6
> included an updated jersey client or something related to that.  Its
> performance was really bad for our case.  In 1.7.0 it was replaced
> with an implementation leveraging okhttp.  This may be and important
> factor.
>
> thanks
> On Fri, Aug 10, 2018 at 5:02 PM Michael Moser <[email protected]> wrote:
>
>
> When I read this I thought of NIFI-4598 [1] and this may be what Joe
> remembers, too.  If your site-to-site clients are older than 1.5.0, then
> maybe this is a factor?
>
> [1] - https://issues.apache.org/jira/browse/NIFI-4598
>
> -- Mike
>
>
> On Fri, Aug 10, 2018 at 4:43 PM Joe Witt <[email protected]> wrote:
>
>
> Joe G
>
> I do recall there were some fixes and improvements related to
> clustering performance/thread pooling/ as it relates to site to site.
> I dont recall precisely which version they went into but i'd strongly
> recommend trying the latest release if you're able.
>
> Thanks
> On Fri, Aug 10, 2018 at 4:13 PM Martijn Dekkers <[email protected]>
> wrote:
>
>
> Whats the OS you are running on? What kind of systems? Memory stats,
> network stats, JVM stats etc. How much data coming through?
>
> On 10 August 2018 at 16:06, Joe Gresock <[email protected]> wrote:
>
>
> Any nifi developers on this list that have any suggestions?
>
> On Wed, Aug 8, 2018 at 7:38 AM Joe Gresock <[email protected]> wrote:
>
>
> I am running a 7-node NiFi 1.6.0 cluster that performs fairly well when
> it's simply processing its own data (putting records in Elasticsearch,
> MongoDB, running transforms, etc.).  However, when we add receiving
> Site-to-Site traffic to the mix, the CPU spikes to the point that the nodes
> can't talk to each other, resulting in the inability to view or modify the
> flow in the console.
>
> I have tried some basic things to mitigate this:
> - Requested that the sending party use a comma-separated list of all 7 of
> our nodes in their Remote Process Group that points to our cluster, in
> hopes that that will help balance the requests
> - Requested that the sending party use some of the batching settings on
> the Remote Port (i.e., Count = 20, Size = 100 MB, Duration = 10 sec)
> - Reduced the thread count on our Input Port to 2
>
> Are there any known nifi.properties that can be set to help mitigate this
> problem?  Again, it only seems to be a problem when we are both receiving
> site-to-site traffic and doing our normal processing, but taking each of
> those activities in isolation seems to be okay.
>
> Thanks,
> Joe
>
>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.    -Philippians 4:12-13
>
>
>
>
>

-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*

Re: NiFi 1.6.0 cluster stability with Site-to-Site

Reply via email to