Hi Mark, Thanks for the quick response, I grepped the logs and did find several hits! I will try increasing the disk space from 30GB to 128GB and hopefully that will speed things up.
2020-11-25 19:33:50,416 INFO [Timer-Driven Process Thread-4] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 19:37:10,649 INFO [Timer-Driven Process Thread-9] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 19:37:10,877 INFO [Timer-Driven Process Thread-3] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 19:37:20,376 INFO [Timer-Driven Process Thread-2] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 19:50:11,195 INFO [Timer-Driven Process Thread-3] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 19:50:22,974 INFO [Timer-Driven Process Thread-6] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 19:50:30,002 INFO [Timer-Driven Process Thread-7] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 19:53:31,591 INFO [Timer-Driven Process Thread-6] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 19:54:00,707 INFO [Timer-Driven Process Thread-4] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 19:54:10,016 INFO [Timer-Driven Process Thread-2] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 19:54:11,148 INFO [Timer-Driven Process Thread-3] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 19:54:26,104 INFO [Timer-Driven Process Thread-5] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup tail: '/opt/nifi/nifi-current/logs/nifi-app.log' has been replaced; following new file 2020-11-25 20:01:27,376 INFO [Timer-Driven Process Thread-10] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:08:06,446 INFO [Timer-Driven Process Thread-6] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:08:06,485 INFO [Timer-Driven Process Thread-8] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:08:07,354 INFO [Timer-Driven Process Thread-3] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:08:10,001 INFO [Timer-Driven Process Thread-2] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:08:10,816 INFO [Timer-Driven Process Thread-10] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:14:11,145 INFO [Timer-Driven Process Thread-9] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:14:11,150 INFO [Timer-Driven Process Thread-6] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:14:11,157 INFO [Timer-Driven Process Thread-4] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:14:20,002 INFO [Timer-Driven Process Thread-3] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:17:06,638 INFO [Timer-Driven Process Thread-9] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:17:06,807 INFO [Timer-Driven Process Thread-4] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:17:06,909 INFO [Timer-Driven Process Thread-8] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:17:25,955 INFO [Timer-Driven Process Thread-1] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:21:18,652 INFO [Timer-Driven Process Thread-10] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:21:20,002 INFO [Timer-Driven Process Thread-5] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:22:47,868 INFO [Timer-Driven Process Thread-7] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:22:48,224 INFO [Timer-Driven Process Thread-2] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:22:48,225 INFO [Timer-Driven Process Thread-8] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:22:49,451 INFO [Timer-Driven Process Thread-1] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:26:13,592 INFO [Timer-Driven Process Thread-7] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:26:13,752 INFO [Timer-Driven Process Thread-3] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:26:14,363 INFO [Timer-Driven Process Thread-2] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup 2020-11-25 20:26:20,003 INFO [Timer-Driven Process Thread-4] o.a.n.c.repository.FileSystemRepository Unable to write to container default due to archive file size constraints; waiting for archive cleanup -Eric On Wed, Nov 25, 2020 at 12:28 PM Mark Payne <[email protected]> wrote: > Eric, > > As I was reading through your response, I was going to ask about how much > of the storage space is used and what the value of > nifi.content.repository.archive.max.usage.percentage is set to. If you > look in the logs I’m guessing you’ll see some logs like "Unable to write to > container XYZ due to archive file size constraints; waiting for archive > cleanup” > > If you’re seeing that, what it’s basically tell you is that the Content > Repository is applying backpressure to prevent you from running out of disk > space. If you set the > “nifi.content.repository.archive.max.usage.percentage” property to say > “90%” you’ll probably see the performance improve and avoid the sporadic > conditions that you’re seeing. But depending on the bursty-ness of your > data, at 90% you could potentially risk running out of disk space. Of > course if you’re already sitting at 79% you may also want to just increase > the amount of disk space that you have. > > Thanks > -Mark > > > > On Nov 25, 2020, at 3:21 PM, Eric Secules <[email protected]> wrote: > > > > Thanks for the tips Mark! > > > > I looked at the summary and there are a fair number of processors at the > top of the list which create flowfiles from an outside source, but there > are also several anomalies. For example there are some processors that are > mid-flow which usually process each flowfile in less than a second, but > sometimes average execution time balloons to several seconds and don't > correlate with flowfile size. I see this behavior on network IO bound > processors (most IO is done to docker containers on the same host), and > even data processors like ReplaceText (full text regex) where I saw > execution time go up to 30 seconds even though the input file size and > content is always the same(22.5KB) It usually takes a couple milliseconds > to process the same file. > > > > I see no backpressure in the flow, but depending on when I look at the > summary I don't always see the anomalies I mentioned above, sometimes it > looks totally acceptable. But other times I see processors like ReplaceText > (which only has one concurrent task) be active for 4 of the past 5 minutes. > > > > I tried looking at the disk metrics in Azure, and it says we aren't near > our quota of 120 IOPS and we do have burst capacity of up to 3100 IOPS. > During testing we are steady at about 20 IOPS. > > > > All 3 file repositories (content, provenance, flow file) are stored on > the OS disk which is at 79% capacity. Could constant pruning of the content > repo be the cause of the intermittent slowness? We have the following > setting: nifi.content.repository.archive.max.usage.percentage=50% > > > > Thanks, > > Eric > > > > On Wed, Nov 25, 2020 at 6:31 AM Mark Payne <[email protected]> wrote: > > Eric, > > > > There’s nothing that I know of that went into 1.12.1 that would cause > the dataflow to be any slower. And I’d expect to have heard about it from > others if there were. There is a chance, though, that a particular > processor that you’re using is slower, due to a newer library perhaps, or > code changes. Of note, I don’t think it’s necessarily more CPU intensive, > if you’re still seeing a load average of only 3.5 - that’s quite low. > > > > My recommendation would be to run your test suite. Give it a good 15 > minutes or so to get into the thick of things. Then look at two things to > determine where the bottleneck is. You’ll want to look for any backpressure > first (the label on the Connection between processors would become red). > That’s a dead giveaway of where the bottleneck is, if that’s kicking in. > The next thing to check is going to the summary table (global menu, aka > hamburger menu, and then Summary). Go to the processors tab and sort by > task time. This will tell you which processors are taking the most time to > run. > > > > In general, though, if backpressure is being applied, the destination of > that connection is the bottleneck. If multiple connections in sequence have > backpressure applied, look to the last one in the chain, as it’s causing > the backpressure to propagate back. If there is no backpressure applied, > then that means that your data flow is able to keep up with the rate of > data that’s coming in. So that would imply that the source processor is not > able to bring the data in as fast as you’d like. That could be due to NiFi > (which would imply your disk is probably not fast enough, since clearly > your CPU has more to give) or that the source of the data is not providing > the data fast enough, etc. You could also try increasing the number of > Concurrent Tasks on the source processor, and perhaps using a second thread > will improve the performance. > > > > Thanks > > -Mark > > > > > >> On Nov 24, 2020, at 5:40 PM, Eric Secules <[email protected]> wrote: > >> > >> Hi Mark, > >> > >> Watching the video now, and will plan to watch more of the series. > Thanks! As for questions, > >> > >> I have NiFi on my macbook pro running in docker and give Docker VM 10 > of my 12 cores and on a smaller test environment. I am seeing performance > issues in both places. In my test environment we run it on a Standard D4s > v3 (4 vcpus, 16 GiB memory) VM with a single 30GB Premium SSD (120 IOPS, 25 > Mbps). NiFi also runs in a docker container. Right now we use the standard > number of thread pool threads (10). At any given time, even if I increase > the number of threads in the pool I don't see the number of active > processors go above 10. So I don't think increasing the size of the pool > will help. My test VM has 4 cores and a load average of 3.5 over the past > minute. And Azure monitoring shows me that the VM doesn't go above 50% > average CPU usage while NiFi is under load. The disk is currently 70% full. > Up until last month a full test suite would take about 30-40 minutes, and > now it's pushing 4 hours. We started noticing tests taking a while shortly > after upgrading NiFi to 1.12.1 from 1.11.4. > >> > >> We don't configure ridiculous amounts of concurrent tasks to > processors. Is it possible that between 1.11.4 and 1.12.1 NiFi became a lot > more CPU intensive? > >> > >> Thanks, > >> Eric > >> > >> > >> > >> On Tue, Nov 24, 2020 at 6:55 AM Mark Payne <[email protected]> > wrote: > >> Eric, > >> > >> I don’t think there’s really any metric that exposes the specific > numbers you’re looking for. Certainly you could run a Java profiler and > look at the results to see where all of the time is being spent. But that > may get into more details than you’re comfortable sorting through, > depending on your knowledge of Java, profilers, and nifi internals. > >> > >> The nifi.bored.yield.duration is definitely an important property when > you’ve got lots of processors that aren’t really doing anything. You can > increase that if you are okay adding potential latency into your dataflow. > That said, 10 milliseconds is the default and generally works quite well, > even with many thousands of processors. Of course, it also depends on how > many cpu cores you have, etc. > >> > >> As for whether or not increasing the number of timer-driven threads > will help, that very much depends on several things. How many threads are > being used? How many CPU cores do you have? How many are being used? There > are a series of videos on YouTube where I’ve discussed nifi anti-patterns. > One of those [1] discusses how to tune the Timer-Driven Thread Pool, which > may be helpful to you. > >> > >> Thanks > >> -Mark > >> > >> [1] https://www.youtube.com/watch?v=pZq0EbfDBy4 > >> > >> > >>> On Nov 23, 2020, at 7:55 PM, Eric Secules <[email protected]> wrote: > >>> > >>> Hello everyone, > >>> > >>> I was wondering if there was a metric for the amount of time > tImer-driven processors spend in a queue ready and waiting to be run. I use > NiFi in an atypical way and my flow has over 2000 processors running on a > single node, but there are usually less than 10 connections that have one > or more flowfiles in them at any given time. > >>> > >>> I have a theory that the number of processors in use is slowing down > the system overall. But I would need to see some more metrics to know > whether that's the case and tell whether anything I am doing is helping. > Are there some logs that I could look for or internal stats I could poke at > with a debugger? > >>> > >>> Should I be able to see increased throughput by increasing the number > of timer-driven threads, or is there a different mechanism responsible for > going through all the runnable processors to see whether they have input to > process. I also noticed "nifi.bored.yield.duration" would it be good to > increase the yield duration in this setting? > >>> > >>> Thanks, > >>> Eric > >> > > > >
