Thanks Phil and Joe,

Both those tips should help a lot!

-Eric

On Fri, Mar 10, 2023 at 9:34 AM Phillip Lord <[email protected]> wrote:

> Eric,
>
> Just wanted to add some thoughts…
>
> In order to help “manage” that many components I’d definitely recommend
> modifying the “nifi.bored.yield.duration” setting.  The default is 10ms…
> I’d recommend increasing this considerably if you’re planning to have 10’s
> of thousands of running components on a single canvas.  This is how often a
> component will check if it has work to do… increasing its bored duration
> will reduce the amount of time components are checking for work.
>
> It “might” introduce some additional latency to flows, but once a
> component understands it has data to work on, it will then continue to run
> based upon the components run-schedule.
>
> Also… I’d recommend breaking down your flows some into separate
> instances.  And/or maybe looking to consolidate some functionality… I don’t
> know how you keep track of that many components, but it sounds like a
> headache :)
>
> Thanks,
> Phil
> On Mar 10, 2023 at 11:44 AM -0500, Joe Witt <[email protected]>, wrote:
>
> Every processor is attempted to be scheduled to run as often as it
> asks. If it asks to run every 0 seconds that translates to 'run
> pretty darn often/fast'. However, we don't actually invoke the code
> in most cases because the check for 'is work to do' will fail as there
> would be no flowfile sitting there. So you'd not really burn
> resources meaningfully in that model. This is part of why it scales
> so well as there are so many flows all on the same nodes all the time.
> But you might want to lower the scheduled run frequency of processors
> that source data as those will always say 'there is work to do'.
>
> Thanks
>
> On Fri, Mar 10, 2023 at 9:26 AM Eric Secules <[email protected]> wrote:
>
>
> Hi Joe,
>
> Thanks for the reply, the reasoning behind my use case for node-slicing of
> flows is the assumption that I would otherwise need several VM's with
> higher memory allocation for them to hold all of the flows and still have
> room for active flowfiles and also have processing capacity to handle the
> traffic. I expect traffic to have a daily peak and then taper off to 0
> activity. I certainly don't expect all processors to have flowfiles in
> their input queues at all times. A couple flows I expect to process a
> million flowfiles a day while others might see only a few hundred. They're
> all configured to run every 0 seconds. Does the scheduler try to run them
> all, or does it only run processors that have flowfiles in the input queue
> and processors that have no input?
>
> Thanks,
> Eric
>
> On Thu, Mar 9, 2023 at 10:32 AM Joe Witt <[email protected]> wrote:
>
>
> Eric
>
> There is a practical limit in terms of memory, browser performance,
> etc... But there isn't otherwise any real hard limit set. We've
> seen flows with many 10s of thousands of processors that are part of
> what can be many dozens or hundreds of process groups. But the
> challenge that comes up is less about the number of components and the
> sheer reality of running that many different flows within a single
> host system. Now sometimes people doing flows like that don't have
> actual live/high volume streams through all of those all the time.
> Often that is used for more job/scheduled type flows that run
> periodically. That is different and can work out depending on time
> slicing/etc..
>
> The entire notion of how NiFi's clustering is designed and works is
> based on 'every node in the clustering being capable of doing any of
> the designed flows'. We do not have a design whereby we'd deploy
> certain flows on certain nodes such that other nodes wouldn't even
> know they exist. However, of course partitioning the work to do be
> done across a cluster is a very common thing. For that we have
> concepts like 'primary node only' execution. Concepts like load
> balanced connections with attribute based affinity so that all data
> with a matching attribute end up on the same node/etc..
>
> It would be very interesting to understand more about your use case
> whereby you end up with 100s of thousands of processors and would want
> node slicing of flows in the cluster.
>
> Thanks
>
> On Wed, Mar 8, 2023 at 9:31 AM Eric Secules <[email protected]> wrote:
>
>
> Hello,
>
> Is there any upper limit on the number of processors that I can have in my
> nifi canvas? Would 100000 still be okay? As I understand it, each processor
> takes up space on the heap as an instance of a class.
>
> If this is a problem my idea would be to use multiple unclustered nifi
> nodes and spread the flows evenly over them.
>
> It would be nice if I could use nifi clustering and set a maximum
> replication factor on a process group so that the flow inside it only
> executes on one or two of my clustered nifi nodes.
>
> Thanks,
> Eric
>
>

Reply via email to