Good question Mark. To clarify, another nifi cluster has an RPG on their canvas (I believe they have 1.6.0 as well) with our cluster's URL, while we have a Root input port. What I meant by "when we add receiving Site-to-Site traffic to the mix", I just mean when they start sending to our existing port.
To answer someone's earlier question, we're running a 7-node cluster on CentOS 6.9 on VMs with 48 GB RAM and 8 vCPUs. I'm wondering if I just need to reduce the thread counts around my cluster. We had recently added several PutElasticsearchHttp processors to store data in a new cluster, so it's possible this pushed our VMs over the edge of what they can handle. The reason I focused on Site-to-Site in my question is because the receiving of traffic really seemed to be a heavy factor in whether the cluster was stable. Also, I'm going to check my logs for any administrative yielding, based on NIFI-5075 mentioned above. On Fri, Aug 10, 2018 at 5:27 PM Mark Payne <[email protected]> wrote: > Joe G, > > Also, to clarify, when you say "when we add receiving Site-to-Site traffic > to the mix, the CPU spikes to the point that the nodes can't talk to each > other, resulting in the inability to view or modify the flow in the console" > what exactly does "when we add receiving Site-to-stie traffic to the mix" > mean? Does that mean adding an Input Port to your canvas' Root Group? Does > it mean starting the Input Port? Or simply having the sender > start transmitting data? Does it mean creating a Remote Process Group on > your canvas? Trying to understand the exact action that is being taken here. > > The reason that I ask is that there was some refactoring of the component > lifecycles in 1.6.0. That caused Funnels that were not fully connected to > start using a huge amount of CPU. > That was addressed in NIFI-5075 [1]. I'm wondering if perhaps you've > stumbled across something similar, related to Root Group Ports or RPG > Ports.... > > Thanks > -Mark > > [1] https://issues.apache.org/jira/browse/NIFI-5075 > > > On Aug 10, 2018, at 5:07 PM, Joe Witt <[email protected]> wrote: > > Yep what Mike points to is exactly what I was thinking of. Since > you're on 1.6.0 then probably the issue is something else. 1.6 > included an updated jersey client or something related to that. Its > performance was really bad for our case. In 1.7.0 it was replaced > with an implementation leveraging okhttp. This may be and important > factor. > > thanks > On Fri, Aug 10, 2018 at 5:02 PM Michael Moser <[email protected]> wrote: > > > When I read this I thought of NIFI-4598 [1] and this may be what Joe > remembers, too. If your site-to-site clients are older than 1.5.0, then > maybe this is a factor? > > [1] - https://issues.apache.org/jira/browse/NIFI-4598 > > -- Mike > > > On Fri, Aug 10, 2018 at 4:43 PM Joe Witt <[email protected]> wrote: > > > Joe G > > I do recall there were some fixes and improvements related to > clustering performance/thread pooling/ as it relates to site to site. > I dont recall precisely which version they went into but i'd strongly > recommend trying the latest release if you're able. > > Thanks > On Fri, Aug 10, 2018 at 4:13 PM Martijn Dekkers <[email protected]> > wrote: > > > Whats the OS you are running on? What kind of systems? Memory stats, > network stats, JVM stats etc. How much data coming through? > > On 10 August 2018 at 16:06, Joe Gresock <[email protected]> wrote: > > > Any nifi developers on this list that have any suggestions? > > On Wed, Aug 8, 2018 at 7:38 AM Joe Gresock <[email protected]> wrote: > > > I am running a 7-node NiFi 1.6.0 cluster that performs fairly well when > it's simply processing its own data (putting records in Elasticsearch, > MongoDB, running transforms, etc.). However, when we add receiving > Site-to-Site traffic to the mix, the CPU spikes to the point that the nodes > can't talk to each other, resulting in the inability to view or modify the > flow in the console. > > I have tried some basic things to mitigate this: > - Requested that the sending party use a comma-separated list of all 7 of > our nodes in their Remote Process Group that points to our cluster, in > hopes that that will help balance the requests > - Requested that the sending party use some of the batching settings on > the Remote Port (i.e., Count = 20, Size = 100 MB, Duration = 10 sec) > - Reduced the thread count on our Input Port to 2 > > Are there any known nifi.properties that can be set to help mitigate this > problem? Again, it only seems to be a problem when we are both receiving > site-to-site traffic and doing our normal processing, but taking each of > those activities in isolation seems to be okay. > > Thanks, > Joe > > > > > -- > I know what it is to be in need, and I know what it is to have plenty. I > have learned the secret of being content in any and every situation, > whether well fed or hungry, whether living in plenty or in want. I can do > all this through him who gives me strength. -Philippians 4:12-13 > > > > > -- I know what it is to be in need, and I know what it is to have plenty. I have learned the secret of being content in any and every situation, whether well fed or hungry, whether living in plenty or in want. I can do all this through him who gives me strength. *-Philippians 4:12-13*
