Re: Tuning for flow with lots of processors

Eric Secules Tue, 24 Nov 2020 14:41:06 -0800

Hi Mark,

Watching the video now, and will plan to watch more of the series. Thanks!
As for questions,

I have NiFi on my macbook pro running in docker and give Docker VM 10 of my
12 cores and on a smaller test environment. I am seeing performance issues
in both places. In my test environment we run it on a Standard D4s v3 (4
vcpus, 16 GiB memory) VM with a single 30GB Premium SSD (120 IOPS, 25
Mbps). NiFi also runs in a docker container. Right now we use the standard
number of thread pool threads (10). At any given time, even if I increase
the number of threads in the pool I don't see the number of active
processors go above 10. So I don't think increasing the size of the pool
will help. My test VM has 4 cores and a load average of 3.5 over the past
minute. And Azure monitoring shows me that the VM doesn't go above 50%
average CPU usage while NiFi is under load. The disk is currently 70% full.
Up until last month a full test suite would take about 30-40 minutes, and
now it's pushing 4 hours. We started noticing tests taking a while shortly
after upgrading NiFi to 1.12.1 from 1.11.4.

We don't configure ridiculous amounts of concurrent tasks to processors. Is
it possible that between 1.11.4 and 1.12.1 NiFi became a lot more CPU
intensive?

Thanks,
Eric

On Tue, Nov 24, 2020 at 6:55 AM Mark Payne <[email protected]> wrote:

> Eric,
>
> I don’t think there’s really any metric that exposes the specific numbers
> you’re looking for. Certainly you could run a Java profiler and look at the
> results to see where all of the time is being spent. But that may get into
> more details than you’re comfortable sorting through, depending on your
> knowledge of Java, profilers, and nifi internals.
>
> The nifi.bored.yield.duration is definitely an important property when
> you’ve got lots of processors that aren’t really doing anything. You can
> increase that if you are okay adding potential latency into your dataflow.
> That said, 10 milliseconds is the default and generally works quite well,
> even with many thousands of processors. Of course, it also depends on how
> many cpu cores you have, etc.
>
> As for whether or not increasing the number of timer-driven threads will
> help, that very much depends on several things. How many threads are being
> used? How many CPU cores do you have? How many are being used? There are a
> series of videos on YouTube where I’ve discussed nifi anti-patterns. One of
> those [1] discusses how to tune the Timer-Driven Thread Pool, which may be
> helpful to you.
>
> Thanks
> -Mark
>
> [1] https://www.youtube.com/watch?v=pZq0EbfDBy4
>
>
> On Nov 23, 2020, at 7:55 PM, Eric Secules <[email protected]> wrote:
>
> Hello everyone,
>
> I was wondering if there was a metric for the amount of time tImer-driven
> processors spend in a queue ready and waiting to be run. I use NiFi in an
> atypical way and my flow has over 2000 processors running on a single node,
> but there are usually less than 10 connections that have one or more
> flowfiles in them at any given time.
>
> I have a theory that the number of processors in use is slowing down the
> system overall. But I would need to see some more metrics to know whether
> that's the case and tell whether anything I am doing is helping. Are there
> some logs that I could look for or internal stats I could poke at with a
> debugger?
>
> Should I be able to see increased throughput by increasing the number of
> timer-driven threads, or is there a different mechanism responsible for
> going through all the runnable processors to see whether they have input to
> process. I also noticed "nifi.bored.yield.duration" would it be good to
> increase the yield duration in this setting?
>
> Thanks,
> Eric
>
>
>

Re: Tuning for flow with lots of processors

Reply via email to