Eric,

Just wanted to add some thoughts…

In order to help “manage” that many components I’d definitely recommend 
modifying the “nifi.bored.yield.duration” setting.  The default is 10ms… I’d 
recommend increasing this considerably if you’re planning to have 10’s of 
thousands of running components on a single canvas.  This is how often a 
component will check if it has work to do… increasing its bored duration will 
reduce the amount of time components are checking for work.

It “might” introduce some additional latency to flows, but once a component 
understands it has data to work on, it will then continue to run based upon the 
components run-schedule.

Also… I’d recommend breaking down your flows some into separate instances.  
And/or maybe looking to consolidate some functionality… I don’t know how you 
keep track of that many components, but it sounds like a headache :)

Thanks,
Phil
On Mar 10, 2023 at 11:44 AM -0500, Joe Witt <[email protected]>, wrote:
> Every processor is attempted to be scheduled to run as often as it
> asks. If it asks to run every 0 seconds that translates to 'run
> pretty darn often/fast'. However, we don't actually invoke the code
> in most cases because the check for 'is work to do' will fail as there
> would be no flowfile sitting there. So you'd not really burn
> resources meaningfully in that model. This is part of why it scales
> so well as there are so many flows all on the same nodes all the time.
> But you might want to lower the scheduled run frequency of processors
> that source data as those will always say 'there is work to do'.
>
> Thanks
>
> On Fri, Mar 10, 2023 at 9:26 AM Eric Secules <[email protected]> wrote:
> >
> > Hi Joe,
> >
> > Thanks for the reply, the reasoning behind my use case for node-slicing of 
> > flows is the assumption that I would otherwise need several VM's with 
> > higher memory allocation for them to hold all of the flows and still have 
> > room for active flowfiles and also have processing capacity to handle the 
> > traffic. I expect traffic to have a daily peak and then taper off to 0 
> > activity. I certainly don't expect all processors to have flowfiles in 
> > their input queues at all times. A couple flows I expect to process a 
> > million flowfiles a day while others might see only a few hundred. They're 
> > all configured to run every 0 seconds. Does the scheduler try to run them 
> > all, or does it only run processors that have flowfiles in the input queue 
> > and processors that have no input?
> >
> > Thanks,
> > Eric
> >
> > On Thu, Mar 9, 2023 at 10:32 AM Joe Witt <[email protected]> wrote:
> > >
> > > Eric
> > >
> > > There is a practical limit in terms of memory, browser performance,
> > > etc... But there isn't otherwise any real hard limit set. We've
> > > seen flows with many 10s of thousands of processors that are part of
> > > what can be many dozens or hundreds of process groups. But the
> > > challenge that comes up is less about the number of components and the
> > > sheer reality of running that many different flows within a single
> > > host system. Now sometimes people doing flows like that don't have
> > > actual live/high volume streams through all of those all the time.
> > > Often that is used for more job/scheduled type flows that run
> > > periodically. That is different and can work out depending on time
> > > slicing/etc..
> > >
> > > The entire notion of how NiFi's clustering is designed and works is
> > > based on 'every node in the clustering being capable of doing any of
> > > the designed flows'. We do not have a design whereby we'd deploy
> > > certain flows on certain nodes such that other nodes wouldn't even
> > > know they exist. However, of course partitioning the work to do be
> > > done across a cluster is a very common thing. For that we have
> > > concepts like 'primary node only' execution. Concepts like load
> > > balanced connections with attribute based affinity so that all data
> > > with a matching attribute end up on the same node/etc..
> > >
> > > It would be very interesting to understand more about your use case
> > > whereby you end up with 100s of thousands of processors and would want
> > > node slicing of flows in the cluster.
> > >
> > > Thanks
> > >
> > > On Wed, Mar 8, 2023 at 9:31 AM Eric Secules <[email protected]> wrote:
> > > >
> > > > Hello,
> > > >
> > > > Is there any upper limit on the number of processors that I can have in 
> > > > my nifi canvas? Would 100000 still be okay? As I understand it, each 
> > > > processor takes up space on the heap as an instance of a class.
> > > >
> > > > If this is a problem my idea would be to use multiple unclustered nifi 
> > > > nodes and spread the flows evenly over them.
> > > >
> > > > It would be nice if I could use nifi clustering and set a maximum 
> > > > replication factor on a process group so that the flow inside it only 
> > > > executes on one or two of my clustered nifi nodes.
> > > >
> > > > Thanks,
> > > > Eric

Reply via email to