Hi Mark,

Thanks for the quick response, I grepped the logs and did find several
hits! I will try increasing the disk space from 30GB to 128GB and hopefully
that will speed things up.

2020-11-25 19:33:50,416 INFO [Timer-Driven Process Thread-4]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 19:37:10,649 INFO [Timer-Driven Process Thread-9]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 19:37:10,877 INFO [Timer-Driven Process Thread-3]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 19:37:20,376 INFO [Timer-Driven Process Thread-2]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 19:50:11,195 INFO [Timer-Driven Process Thread-3]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 19:50:22,974 INFO [Timer-Driven Process Thread-6]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 19:50:30,002 INFO [Timer-Driven Process Thread-7]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 19:53:31,591 INFO [Timer-Driven Process Thread-6]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 19:54:00,707 INFO [Timer-Driven Process Thread-4]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 19:54:10,016 INFO [Timer-Driven Process Thread-2]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 19:54:11,148 INFO [Timer-Driven Process Thread-3]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 19:54:26,104 INFO [Timer-Driven Process Thread-5]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
tail: '/opt/nifi/nifi-current/logs/nifi-app.log' has been replaced;
 following new file
2020-11-25 20:01:27,376 INFO [Timer-Driven Process Thread-10]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:08:06,446 INFO [Timer-Driven Process Thread-6]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:08:06,485 INFO [Timer-Driven Process Thread-8]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:08:07,354 INFO [Timer-Driven Process Thread-3]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:08:10,001 INFO [Timer-Driven Process Thread-2]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:08:10,816 INFO [Timer-Driven Process Thread-10]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:14:11,145 INFO [Timer-Driven Process Thread-9]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:14:11,150 INFO [Timer-Driven Process Thread-6]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:14:11,157 INFO [Timer-Driven Process Thread-4]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:14:20,002 INFO [Timer-Driven Process Thread-3]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:17:06,638 INFO [Timer-Driven Process Thread-9]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:17:06,807 INFO [Timer-Driven Process Thread-4]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:17:06,909 INFO [Timer-Driven Process Thread-8]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:17:25,955 INFO [Timer-Driven Process Thread-1]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:21:18,652 INFO [Timer-Driven Process Thread-10]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:21:20,002 INFO [Timer-Driven Process Thread-5]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:22:47,868 INFO [Timer-Driven Process Thread-7]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:22:48,224 INFO [Timer-Driven Process Thread-2]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:22:48,225 INFO [Timer-Driven Process Thread-8]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:22:49,451 INFO [Timer-Driven Process Thread-1]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:26:13,592 INFO [Timer-Driven Process Thread-7]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:26:13,752 INFO [Timer-Driven Process Thread-3]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:26:14,363 INFO [Timer-Driven Process Thread-2]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup
2020-11-25 20:26:20,003 INFO [Timer-Driven Process Thread-4]
o.a.n.c.repository.FileSystemRepository Unable to write to container
default due to archive file size constraints; waiting for archive cleanup


-Eric


On Wed, Nov 25, 2020 at 12:28 PM Mark Payne <[email protected]> wrote:

> Eric,
>
> As I was reading through your response, I was going to ask about how much
> of the storage space is used and what the value of
> nifi.content.repository.archive.max.usage.percentage is set to.  If you
> look in the logs I’m guessing you’ll see some logs like "Unable to write to
> container XYZ due to archive file size constraints; waiting for archive
> cleanup”
>
> If you’re seeing that, what it’s basically tell you is that the Content
> Repository is applying backpressure to prevent you from running out of disk
> space. If you set the
> “nifi.content.repository.archive.max.usage.percentage” property to say
> “90%” you’ll probably see the performance improve and avoid the sporadic
> conditions that you’re seeing. But depending on the bursty-ness of your
> data, at 90% you could potentially risk running out of disk space. Of
> course if you’re already sitting at 79% you may also want to just increase
> the amount of disk space that you have.
>
> Thanks
> -Mark
>
>
> > On Nov 25, 2020, at 3:21 PM, Eric Secules <[email protected]> wrote:
> >
> > Thanks for the tips Mark!
> >
> > I looked at the summary and there are a fair number of processors at the
> top of the list which create flowfiles from an outside source, but there
> are also several anomalies. For example there are some processors that are
> mid-flow which usually process each flowfile in less than a second, but
> sometimes average execution time balloons to several seconds and don't
> correlate with flowfile size. I see this behavior on network IO bound
> processors (most IO is done to docker containers on the same host), and
> even data processors like ReplaceText (full text regex) where I saw
> execution time go up to 30 seconds even though the input file size and
> content is always the same(22.5KB) It usually takes a couple milliseconds
> to process the same file.
> >
> > I see no backpressure in the flow, but depending on when I look at the
> summary I don't always see the anomalies I mentioned above, sometimes it
> looks totally acceptable. But other times I see processors like ReplaceText
> (which only has one concurrent task) be active for 4 of the past 5 minutes.
> >
> > I tried looking at the disk metrics in Azure, and it says we aren't near
> our quota of 120 IOPS and we do have burst capacity of up to 3100 IOPS.
> During testing we are steady at about 20 IOPS.
> >
> > All 3 file repositories (content, provenance, flow file) are stored on
> the OS disk which is at 79% capacity. Could constant pruning of the content
> repo be the cause of the intermittent slowness? We have the following
> setting: nifi.content.repository.archive.max.usage.percentage=50%
> >
> > Thanks,
> > Eric
> >
> > On Wed, Nov 25, 2020 at 6:31 AM Mark Payne <[email protected]> wrote:
> > Eric,
> >
> > There’s nothing that I know of that went into 1.12.1 that would cause
> the dataflow to be any slower. And I’d expect to have heard about it from
> others if there were. There is a chance, though, that a particular
> processor that you’re using is slower, due to a newer library perhaps, or
> code changes. Of note, I don’t think it’s necessarily more CPU intensive,
> if you’re still seeing a load average of only 3.5 - that’s quite low.
> >
> > My recommendation would be to run your test suite. Give it a good 15
> minutes or so to get into the thick of things. Then look at two things to
> determine where the bottleneck is. You’ll want to look for any backpressure
> first (the label on the Connection between processors would become red).
> That’s a dead giveaway of where the bottleneck is, if that’s kicking in.
> The next thing to check is going to the summary table (global menu, aka
> hamburger menu, and then Summary). Go to the processors tab and sort by
> task time. This will tell you which processors are taking the most time to
> run.
> >
> > In general, though, if backpressure is being applied, the destination of
> that connection is the bottleneck. If multiple connections in sequence have
> backpressure applied, look to the last one in the chain, as it’s causing
> the backpressure to propagate back. If there is no backpressure applied,
> then that means that your data flow is able to keep up with the rate of
> data that’s coming in. So that would imply that the source processor is not
> able to bring the data in as fast as you’d like. That could be due to NiFi
> (which would imply your disk is probably not fast enough, since clearly
> your CPU has more to give) or that the source of the data is not providing
> the data fast enough, etc. You could also try increasing the number of
> Concurrent Tasks on the source processor, and perhaps using a second thread
> will improve the performance.
> >
> > Thanks
> > -Mark
> >
> >
> >> On Nov 24, 2020, at 5:40 PM, Eric Secules <[email protected]> wrote:
> >>
> >> Hi Mark,
> >>
> >> Watching the video now, and will plan to watch more of the series.
> Thanks! As for questions,
> >>
> >> I have NiFi on my macbook pro running in docker and give Docker VM 10
> of my 12 cores and on a smaller test environment. I am seeing performance
> issues in both places. In my test environment we run it on a Standard D4s
> v3 (4 vcpus, 16 GiB memory) VM with a single 30GB Premium SSD (120 IOPS, 25
> Mbps). NiFi also runs in a docker container. Right now we use the standard
> number of thread pool threads (10). At any given time, even if I increase
> the number of threads in the pool I don't see the number of active
> processors go above 10. So I don't think increasing the size of the pool
> will help. My test VM has 4 cores and a load average of 3.5 over the past
> minute. And Azure monitoring shows me that the VM doesn't go above 50%
> average CPU usage while NiFi is under load. The disk is currently 70% full.
> Up until last month a full test suite would take about 30-40 minutes, and
> now it's pushing 4 hours. We started noticing tests taking a while shortly
> after upgrading NiFi to 1.12.1 from 1.11.4.
> >>
> >> We don't configure ridiculous amounts of concurrent tasks to
> processors. Is it possible that between 1.11.4 and 1.12.1 NiFi became a lot
> more CPU intensive?
> >>
> >> Thanks,
> >> Eric
> >>
> >>
> >>
> >> On Tue, Nov 24, 2020 at 6:55 AM Mark Payne <[email protected]>
> wrote:
> >> Eric,
> >>
> >> I don’t think there’s really any metric that exposes the specific
> numbers you’re looking for. Certainly you could run a Java profiler and
> look at the results to see where all of the time is being spent. But that
> may get into more details than you’re comfortable sorting through,
> depending on your knowledge of Java, profilers, and nifi internals.
> >>
> >> The nifi.bored.yield.duration is definitely an important property when
> you’ve got lots of processors that aren’t really doing anything. You can
> increase that if you are okay adding potential latency into your dataflow.
> That said, 10 milliseconds is the default and generally works quite well,
> even with many thousands of processors. Of course, it also depends on how
> many cpu cores you have, etc.
> >>
> >> As for whether or not increasing the number of timer-driven threads
> will help, that very much depends on several things. How many threads are
> being used? How many CPU cores do you have? How many are being used? There
> are a series of videos on YouTube where I’ve discussed nifi anti-patterns.
> One of those [1] discusses how to tune the Timer-Driven Thread Pool, which
> may be helpful to you.
> >>
> >> Thanks
> >> -Mark
> >>
> >> [1] https://www.youtube.com/watch?v=pZq0EbfDBy4
> >>
> >>
> >>> On Nov 23, 2020, at 7:55 PM, Eric Secules <[email protected]> wrote:
> >>>
> >>> Hello everyone,
> >>>
> >>> I was wondering if there was a metric for the amount of time
> tImer-driven processors spend in a queue ready and waiting to be run. I use
> NiFi in an atypical way and my flow has over 2000 processors running on a
> single node, but there are usually less than 10 connections that have one
> or more flowfiles in them at any given time.
> >>>
> >>> I have a theory that the number of processors in use is slowing down
> the system overall. But I would need to see some more metrics to know
> whether that's the case and tell whether anything I am doing is helping.
> Are there some logs that I could look for or internal stats I could poke at
> with a debugger?
> >>>
> >>> Should I be able to see increased throughput by increasing the number
> of timer-driven threads, or is there a different mechanism responsible for
> going through all the runnable processors to see whether they have input to
> process. I also noticed "nifi.bored.yield.duration" would it be good to
> increase the yield duration in this setting?
> >>>
> >>> Thanks,
> >>> Eric
> >>
> >
>
>

Reply via email to