Every processor is attempted to be scheduled to run as often as it
asks.  If it asks to run every 0 seconds that translates to 'run
pretty darn often/fast'.  However, we don't actually invoke the code
in most cases because the check for 'is work to do' will fail as there
would be no flowfile sitting there.  So you'd not really burn
resources meaningfully in that model.  This is part of why it scales
so well as there are so many flows all on the same nodes all the time.
But you might want to lower the scheduled run frequency of processors
that source data as those will always say 'there is work to do'.

Thanks

On Fri, Mar 10, 2023 at 9:26 AM Eric Secules <[email protected]> wrote:
>
>  Hi Joe,
>
> Thanks for the reply, the reasoning behind my use case for node-slicing of 
> flows is the assumption that I would otherwise need several VM's with higher 
> memory allocation for them to hold all of the flows and still have room for 
> active flowfiles and also have processing capacity to handle the traffic. I 
> expect traffic to have a daily peak and then taper off to 0 activity. I 
> certainly don't expect all processors to have flowfiles in their input queues 
> at all times. A couple flows I expect to process a million flowfiles a day 
> while others might see only a few hundred. They're all configured to run 
> every 0 seconds. Does the scheduler try to run them all, or does it only run 
> processors that have flowfiles in the input queue and processors that have no 
> input?
>
> Thanks,
> Eric
>
> On Thu, Mar 9, 2023 at 10:32 AM Joe Witt <[email protected]> wrote:
>>
>> Eric
>>
>> There is a practical limit in terms of memory, browser performance,
>> etc...   But there isn't otherwise any real hard limit set.  We've
>> seen flows with many 10s of thousands of processors that are part of
>> what can be many dozens or hundreds of process groups.  But the
>> challenge that comes up is less about the number of components and the
>> sheer reality of running that many different flows within a single
>> host system.  Now sometimes people doing flows like that don't have
>> actual live/high volume streams through all of those all the time.
>> Often that is used for more job/scheduled type flows that run
>> periodically.  That is different and can work out depending on time
>> slicing/etc..
>>
>> The entire notion of how NiFi's clustering is designed and works is
>> based on 'every node in the clustering being capable of doing any of
>> the designed flows'.  We do not have a design whereby we'd deploy
>> certain flows on certain nodes such that other nodes wouldn't even
>> know they exist.  However, of course partitioning the work to do be
>> done across a cluster is a very common thing.  For that we have
>> concepts like 'primary node only' execution.  Concepts like load
>> balanced connections with attribute based affinity so that all data
>> with a matching attribute end up on the same node/etc..
>>
>> It would be very interesting to understand more about your use case
>> whereby you end up with 100s of thousands of processors and would want
>> node slicing of flows in the cluster.
>>
>> Thanks
>>
>> On Wed, Mar 8, 2023 at 9:31 AM Eric Secules <[email protected]> wrote:
>> >
>> > Hello,
>> >
>> > Is there any upper limit on the number of processors that I can have in my 
>> > nifi canvas? Would 100000 still be okay? As I understand it, each 
>> > processor takes up space on the heap as an instance of a class.
>> >
>> > If this is a problem my idea would be to use multiple unclustered nifi 
>> > nodes and spread the flows evenly over them.
>> >
>> > It would be nice if I could use nifi clustering and set a maximum 
>> > replication factor on a process group so that the flow inside it only 
>> > executes on one or two of my clustered nifi nodes.
>> >
>> > Thanks,
>> > Eric

Reply via email to