Re: Round robin load balancing eventually stops using all nodes

Mike Thomsen Fri, 01 Apr 2022 04:29:13 -0700

When we talk about "slower nodes" here, are we referring to nodes that
are bogged down by data but of the same size as the rest of the
cluster or are we talking about a heterogeneous cluster?


On Mon, Sep 27, 2021 at 12:07 PM Joe Witt <[email protected]> wrote:
>
> Ryan,
>
> Regarding NIFI-9236 the JIRA captures it well but sounds like there is
> now a better understanding of how it works and what options exist to
> better view details.
>
> Regarding Load Balancing: NIFI-7081 is largely about the scenario
> whereby in load balancing cases nodes which are slower effectively set
> the rate the whole cluster can sustain because we don't have a fluid
> load balancing strategy which we should.  Such a strategy would allow
> for the fastest nodes to always take the most data.  We just need to
> do that work.  No ETA.
>
> Thanks
>
> On Tue, Sep 21, 2021 at 2:18 PM Ryan Hendrickson
> <[email protected]> wrote:
> >
> > Joe - We're testing some scenarios.  Andrew captured some confusing 
> > behavior in the UI when enabling and disabling load balancing on a 
> > relationship: "Update UI for Clustered Connections" -- 
> > https://issues.apache.org/jira/projects/NIFI/issues/NIFI-9236
> >
> > Question - When a FlowFile is Load Balanced from one node to another, is 
> > the entire Content Claim load balanced?  Or just the small portion 
> > necessary?
> >
> > Mike -
> > We found two tickets that are in the ballpark:
> >
> > 1.  Improve handling of Load Balanced Connections when one node is slow   
> > --    https://issues.apache.org/jira/browse/NIFI-7081
> > 2.  NiFi FlowFiles stuck in queue when using Single Node load balance 
> > strategy   --    https://issues.apache.org/jira/browse/NIFI-8970
> >
> > From @Simon comment - we know we've seen underperforming nodes in a cluster 
> > before.  We're discussing @Simon's comment is applicable to the issue we're 
> > seeing
> >           > "The one thing I can think of is the scenario where one (or 
> > more) nodes are significantly slower than the other ones. In these cases it 
> > might happen then the nodes are “running behind” blocks the other nodes 
> > from balancing perspective."
> >
> > @Simon - I'd like to understand the "blocks other nodes from balancing 
> > perspective" better if you have additional information.  We're trying to 
> > replicate this scenario.
> >
> > Thanks,
> > Ryan
> >
> > On Sat, Sep 18, 2021 at 3:45 PM Mike Thomsen <[email protected]> wrote:
> >>
> >> > there is a ticket to overcome this (there is no ETA),
> >>
> >> Do you know what the Jira # is?
> >>
> >> On Mon, Sep 6, 2021 at 7:14 AM Simon Bence <[email protected]> 
> >> wrote:
> >> >
> >> > Hi Mike,
> >> >
> >> > I did a quick check on the round robin balancing and based on what I 
> >> > found the reason for the issue must lie somewhere else, not directly 
> >> > within it. The one thing I can think of is the scenario where one (or 
> >> > more) nodes are significantly slower than the other ones. In these cases 
> >> > it might happen then the nodes are “running behind” blocks the other 
> >> > nodes from balancing perspective.
> >> >
> >> > Based on what you wrote this is a possible reason and there is a ticket 
> >> > to overcome this (there is no ETA), but other details might shed light 
> >> > to a different root cause.
> >> >
> >> > Regards,
> >> > Bence
> >> >
> >> >
> >> >
> >> > > On 2021. Sep 3., at 14:13, Mike Thomsen <[email protected]> wrote:
> >> > >
> >> > > We have a 5 node cluster, and sometimes I've noticed that round robin
> >> > > load balancing stops sending flowfiles to two of them, and sometimes
> >> > > toward the end of the data processing can get as low as a single node.
> >> > > Has anyone seen similar behavior?
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Mike
> >> >

Re: Round robin load balancing eventually stops using all nodes

Reply via email to