Hi Mark, It was because the main disk was filling up! We increased the disk size to 128GB and speed improved!
Thanks, Eric On Wed., Nov. 25, 2020, 12:34 p.m. Eric Secules, <esecu...@gmail.com> wrote: > Hi Mark, > > Thanks for the quick response, I grepped the logs and did find several > hits! I will try increasing the disk space from 30GB to 128GB and hopefully > that will speed things up. > > 2020-11-25 19:33:50,416 INFO [Timer-Driven Process Thread-4] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 19:37:10,649 INFO [Timer-Driven Process Thread-9] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 19:37:10,877 INFO [Timer-Driven Process Thread-3] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 19:37:20,376 INFO [Timer-Driven Process Thread-2] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 19:50:11,195 INFO [Timer-Driven Process Thread-3] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 19:50:22,974 INFO [Timer-Driven Process Thread-6] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 19:50:30,002 INFO [Timer-Driven Process Thread-7] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 19:53:31,591 INFO [Timer-Driven Process Thread-6] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 19:54:00,707 INFO [Timer-Driven Process Thread-4] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 19:54:10,016 INFO [Timer-Driven Process Thread-2] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 19:54:11,148 INFO [Timer-Driven Process Thread-3] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 19:54:26,104 INFO [Timer-Driven Process Thread-5] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > tail: '/opt/nifi/nifi-current/logs/nifi-app.log' has been replaced; > following new file > 2020-11-25 20:01:27,376 INFO [Timer-Driven Process Thread-10] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:08:06,446 INFO [Timer-Driven Process Thread-6] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:08:06,485 INFO [Timer-Driven Process Thread-8] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:08:07,354 INFO [Timer-Driven Process Thread-3] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:08:10,001 INFO [Timer-Driven Process Thread-2] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:08:10,816 INFO [Timer-Driven Process Thread-10] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:14:11,145 INFO [Timer-Driven Process Thread-9] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:14:11,150 INFO [Timer-Driven Process Thread-6] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:14:11,157 INFO [Timer-Driven Process Thread-4] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:14:20,002 INFO [Timer-Driven Process Thread-3] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:17:06,638 INFO [Timer-Driven Process Thread-9] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:17:06,807 INFO [Timer-Driven Process Thread-4] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:17:06,909 INFO [Timer-Driven Process Thread-8] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:17:25,955 INFO [Timer-Driven Process Thread-1] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:21:18,652 INFO [Timer-Driven Process Thread-10] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:21:20,002 INFO [Timer-Driven Process Thread-5] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:22:47,868 INFO [Timer-Driven Process Thread-7] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:22:48,224 INFO [Timer-Driven Process Thread-2] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:22:48,225 INFO [Timer-Driven Process Thread-8] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:22:49,451 INFO [Timer-Driven Process Thread-1] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:26:13,592 INFO [Timer-Driven Process Thread-7] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:26:13,752 INFO [Timer-Driven Process Thread-3] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:26:14,363 INFO [Timer-Driven Process Thread-2] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > 2020-11-25 20:26:20,003 INFO [Timer-Driven Process Thread-4] > o.a.n.c.repository.FileSystemRepository Unable to write to container > default due to archive file size constraints; waiting for archive cleanup > > > -Eric > > > On Wed, Nov 25, 2020 at 12:28 PM Mark Payne <marka...@hotmail.com> wrote: > >> Eric, >> >> As I was reading through your response, I was going to ask about how much >> of the storage space is used and what the value of >> nifi.content.repository.archive.max.usage.percentage is set to. If you >> look in the logs I’m guessing you’ll see some logs like "Unable to write to >> container XYZ due to archive file size constraints; waiting for archive >> cleanup” >> >> If you’re seeing that, what it’s basically tell you is that the Content >> Repository is applying backpressure to prevent you from running out of disk >> space. If you set the >> “nifi.content.repository.archive.max.usage.percentage” property to say >> “90%” you’ll probably see the performance improve and avoid the sporadic >> conditions that you’re seeing. But depending on the bursty-ness of your >> data, at 90% you could potentially risk running out of disk space. Of >> course if you’re already sitting at 79% you may also want to just increase >> the amount of disk space that you have. >> >> Thanks >> -Mark >> >> >> > On Nov 25, 2020, at 3:21 PM, Eric Secules <esecu...@gmail.com> wrote: >> > >> > Thanks for the tips Mark! >> > >> > I looked at the summary and there are a fair number of processors at >> the top of the list which create flowfiles from an outside source, but >> there are also several anomalies. For example there are some processors >> that are mid-flow which usually process each flowfile in less than a >> second, but sometimes average execution time balloons to several seconds >> and don't correlate with flowfile size. I see this behavior on network IO >> bound processors (most IO is done to docker containers on the same host), >> and even data processors like ReplaceText (full text regex) where I saw >> execution time go up to 30 seconds even though the input file size and >> content is always the same(22.5KB) It usually takes a couple milliseconds >> to process the same file. >> > >> > I see no backpressure in the flow, but depending on when I look at the >> summary I don't always see the anomalies I mentioned above, sometimes it >> looks totally acceptable. But other times I see processors like ReplaceText >> (which only has one concurrent task) be active for 4 of the past 5 minutes. >> > >> > I tried looking at the disk metrics in Azure, and it says we aren't >> near our quota of 120 IOPS and we do have burst capacity of up to 3100 >> IOPS. During testing we are steady at about 20 IOPS. >> > >> > All 3 file repositories (content, provenance, flow file) are stored on >> the OS disk which is at 79% capacity. Could constant pruning of the content >> repo be the cause of the intermittent slowness? We have the following >> setting: nifi.content.repository.archive.max.usage.percentage=50% >> > >> > Thanks, >> > Eric >> > >> > On Wed, Nov 25, 2020 at 6:31 AM Mark Payne <marka...@hotmail.com> >> wrote: >> > Eric, >> > >> > There’s nothing that I know of that went into 1.12.1 that would cause >> the dataflow to be any slower. And I’d expect to have heard about it from >> others if there were. There is a chance, though, that a particular >> processor that you’re using is slower, due to a newer library perhaps, or >> code changes. Of note, I don’t think it’s necessarily more CPU intensive, >> if you’re still seeing a load average of only 3.5 - that’s quite low. >> > >> > My recommendation would be to run your test suite. Give it a good 15 >> minutes or so to get into the thick of things. Then look at two things to >> determine where the bottleneck is. You’ll want to look for any backpressure >> first (the label on the Connection between processors would become red). >> That’s a dead giveaway of where the bottleneck is, if that’s kicking in. >> The next thing to check is going to the summary table (global menu, aka >> hamburger menu, and then Summary). Go to the processors tab and sort by >> task time. This will tell you which processors are taking the most time to >> run. >> > >> > In general, though, if backpressure is being applied, the destination >> of that connection is the bottleneck. If multiple connections in sequence >> have backpressure applied, look to the last one in the chain, as it’s >> causing the backpressure to propagate back. If there is no backpressure >> applied, then that means that your data flow is able to keep up with the >> rate of data that’s coming in. So that would imply that the source >> processor is not able to bring the data in as fast as you’d like. That >> could be due to NiFi (which would imply your disk is probably not fast >> enough, since clearly your CPU has more to give) or that the source of the >> data is not providing the data fast enough, etc. You could also try >> increasing the number of Concurrent Tasks on the source processor, and >> perhaps using a second thread will improve the performance. >> > >> > Thanks >> > -Mark >> > >> > >> >> On Nov 24, 2020, at 5:40 PM, Eric Secules <esecu...@gmail.com> wrote: >> >> >> >> Hi Mark, >> >> >> >> Watching the video now, and will plan to watch more of the series. >> Thanks! As for questions, >> >> >> >> I have NiFi on my macbook pro running in docker and give Docker VM 10 >> of my 12 cores and on a smaller test environment. I am seeing performance >> issues in both places. In my test environment we run it on a Standard D4s >> v3 (4 vcpus, 16 GiB memory) VM with a single 30GB Premium SSD (120 IOPS, 25 >> Mbps). NiFi also runs in a docker container. Right now we use the standard >> number of thread pool threads (10). At any given time, even if I increase >> the number of threads in the pool I don't see the number of active >> processors go above 10. So I don't think increasing the size of the pool >> will help. My test VM has 4 cores and a load average of 3.5 over the past >> minute. And Azure monitoring shows me that the VM doesn't go above 50% >> average CPU usage while NiFi is under load. The disk is currently 70% full. >> Up until last month a full test suite would take about 30-40 minutes, and >> now it's pushing 4 hours. We started noticing tests taking a while shortly >> after upgrading NiFi to 1.12.1 from 1.11.4. >> >> >> >> We don't configure ridiculous amounts of concurrent tasks to >> processors. Is it possible that between 1.11.4 and 1.12.1 NiFi became a lot >> more CPU intensive? >> >> >> >> Thanks, >> >> Eric >> >> >> >> >> >> >> >> On Tue, Nov 24, 2020 at 6:55 AM Mark Payne <marka...@hotmail.com> >> wrote: >> >> Eric, >> >> >> >> I don’t think there’s really any metric that exposes the specific >> numbers you’re looking for. Certainly you could run a Java profiler and >> look at the results to see where all of the time is being spent. But that >> may get into more details than you’re comfortable sorting through, >> depending on your knowledge of Java, profilers, and nifi internals. >> >> >> >> The nifi.bored.yield.duration is definitely an important property when >> you’ve got lots of processors that aren’t really doing anything. You can >> increase that if you are okay adding potential latency into your dataflow. >> That said, 10 milliseconds is the default and generally works quite well, >> even with many thousands of processors. Of course, it also depends on how >> many cpu cores you have, etc. >> >> >> >> As for whether or not increasing the number of timer-driven threads >> will help, that very much depends on several things. How many threads are >> being used? How many CPU cores do you have? How many are being used? There >> are a series of videos on YouTube where I’ve discussed nifi anti-patterns. >> One of those [1] discusses how to tune the Timer-Driven Thread Pool, which >> may be helpful to you. >> >> >> >> Thanks >> >> -Mark >> >> >> >> [1] https://www.youtube.com/watch?v=pZq0EbfDBy4 >> >> >> >> >> >>> On Nov 23, 2020, at 7:55 PM, Eric Secules <esecu...@gmail.com> wrote: >> >>> >> >>> Hello everyone, >> >>> >> >>> I was wondering if there was a metric for the amount of time >> tImer-driven processors spend in a queue ready and waiting to be run. I use >> NiFi in an atypical way and my flow has over 2000 processors running on a >> single node, but there are usually less than 10 connections that have one >> or more flowfiles in them at any given time. >> >>> >> >>> I have a theory that the number of processors in use is slowing down >> the system overall. But I would need to see some more metrics to know >> whether that's the case and tell whether anything I am doing is helping. >> Are there some logs that I could look for or internal stats I could poke at >> with a debugger? >> >>> >> >>> Should I be able to see increased throughput by increasing the number >> of timer-driven threads, or is there a different mechanism responsible for >> going through all the runnable processors to see whether they have input to >> process. I also noticed "nifi.bored.yield.duration" would it be good to >> increase the yield duration in this setting? >> >>> >> >>> Thanks, >> >>> Eric >> >> >> > >> >>