Thanks Phil and Joe, Both those tips should help a lot!
-Eric On Fri, Mar 10, 2023 at 9:34 AM Phillip Lord <[email protected]> wrote: > Eric, > > Just wanted to add some thoughts… > > In order to help “manage” that many components I’d definitely recommend > modifying the “nifi.bored.yield.duration” setting. The default is 10ms… > I’d recommend increasing this considerably if you’re planning to have 10’s > of thousands of running components on a single canvas. This is how often a > component will check if it has work to do… increasing its bored duration > will reduce the amount of time components are checking for work. > > It “might” introduce some additional latency to flows, but once a > component understands it has data to work on, it will then continue to run > based upon the components run-schedule. > > Also… I’d recommend breaking down your flows some into separate > instances. And/or maybe looking to consolidate some functionality… I don’t > know how you keep track of that many components, but it sounds like a > headache :) > > Thanks, > Phil > On Mar 10, 2023 at 11:44 AM -0500, Joe Witt <[email protected]>, wrote: > > Every processor is attempted to be scheduled to run as often as it > asks. If it asks to run every 0 seconds that translates to 'run > pretty darn often/fast'. However, we don't actually invoke the code > in most cases because the check for 'is work to do' will fail as there > would be no flowfile sitting there. So you'd not really burn > resources meaningfully in that model. This is part of why it scales > so well as there are so many flows all on the same nodes all the time. > But you might want to lower the scheduled run frequency of processors > that source data as those will always say 'there is work to do'. > > Thanks > > On Fri, Mar 10, 2023 at 9:26 AM Eric Secules <[email protected]> wrote: > > > Hi Joe, > > Thanks for the reply, the reasoning behind my use case for node-slicing of > flows is the assumption that I would otherwise need several VM's with > higher memory allocation for them to hold all of the flows and still have > room for active flowfiles and also have processing capacity to handle the > traffic. I expect traffic to have a daily peak and then taper off to 0 > activity. I certainly don't expect all processors to have flowfiles in > their input queues at all times. A couple flows I expect to process a > million flowfiles a day while others might see only a few hundred. They're > all configured to run every 0 seconds. Does the scheduler try to run them > all, or does it only run processors that have flowfiles in the input queue > and processors that have no input? > > Thanks, > Eric > > On Thu, Mar 9, 2023 at 10:32 AM Joe Witt <[email protected]> wrote: > > > Eric > > There is a practical limit in terms of memory, browser performance, > etc... But there isn't otherwise any real hard limit set. We've > seen flows with many 10s of thousands of processors that are part of > what can be many dozens or hundreds of process groups. But the > challenge that comes up is less about the number of components and the > sheer reality of running that many different flows within a single > host system. Now sometimes people doing flows like that don't have > actual live/high volume streams through all of those all the time. > Often that is used for more job/scheduled type flows that run > periodically. That is different and can work out depending on time > slicing/etc.. > > The entire notion of how NiFi's clustering is designed and works is > based on 'every node in the clustering being capable of doing any of > the designed flows'. We do not have a design whereby we'd deploy > certain flows on certain nodes such that other nodes wouldn't even > know they exist. However, of course partitioning the work to do be > done across a cluster is a very common thing. For that we have > concepts like 'primary node only' execution. Concepts like load > balanced connections with attribute based affinity so that all data > with a matching attribute end up on the same node/etc.. > > It would be very interesting to understand more about your use case > whereby you end up with 100s of thousands of processors and would want > node slicing of flows in the cluster. > > Thanks > > On Wed, Mar 8, 2023 at 9:31 AM Eric Secules <[email protected]> wrote: > > > Hello, > > Is there any upper limit on the number of processors that I can have in my > nifi canvas? Would 100000 still be okay? As I understand it, each processor > takes up space on the heap as an instance of a class. > > If this is a problem my idea would be to use multiple unclustered nifi > nodes and spread the flows evenly over them. > > It would be nice if I could use nifi clustering and set a maximum > replication factor on a process group so that the flow inside it only > executes on one or two of my clustered nifi nodes. > > Thanks, > Eric > >
