I have back pressure object threshold set to 100000 on that queue and my swap threshold is 200000. I don't think though when I had the issue the number of flow files was very high in the queue in question since the issue was now at updaterecord after I did a mergecontent that greatly reduced the number of flow files.
On Mon, Jun 1, 2020, 16:02 Mark Payne <[email protected]> wrote: > Hey Robert, > > How big are the FlowFile queues that you have in front of your > MergeContent/MergeRecord processors? Or, more specifically, what do you > have configured for the back pressure threshold? I ask because there was a > fix in 1.11.0 [1] that had to do with ordering when swapping and ensuring > that data remains in the same order after being swapped out and swapped > back in when using the FIFO prioritizer. > > Some of the changes there can actually change the thresholds when we > perform swapping. So I’m curious if you’re seeing a lot of swapping of > FlowFiles to/from disk when running in 1.11.4 that you didn’t have in > 1.9.2. Are you seeing logs about swapping occurring? And of note, when I > talk about swapping, I’m talking about NiFi-level FlowFile swapping, not > OS-level swapping. > > Thanks > -Mark > > [1`] https://issues.apache.org/jira/browse/NIFI-7011 > > > On May 22, 2020, at 10:35 AM, Robert R. Bruno <[email protected]> wrote: > > Sorry one other thing I thought of that may help. I noticed on 1.11.4 > when I would stop the updaterecord processor it would take a long period of > time for the processor to stop (threads were hanging), but when I went back > to 1.9.2 the processor would stop in a very timely manner. Not sure if > that helps, but just another data point. > > On Fri, May 22, 2020 at 9:22 AM Robert R. Bruno <[email protected]> wrote: > >> I had more updates on this. >> >> Yesterday I again attempted to upgrade one of our 1.9.2 clusters that is >> now using mergecontent vs mergerecord. The flow had been running on 1.9.2 >> for about a week with no issue. I did the upgrade to 1.11.4, and saw about >> 3 of 10 nodes not being able to keep up. The load on these 3 nodes became >> very high. For perspective, a load of 80 is about as high as we like to >> see these boxes, and some were getting as high as 120. I saw one >> bottleneck forming at an updaterecord. I tried giving that processor a few >> more threads to see if it would help work off the backlog. No matter what >> I tried (lowering thread, changing mergecontent sizes, etc) the load >> wouldn't go down on those 3 boxes and they had either a slowing growing >> backlog or would maintain the backlog they had. >> >> I then decide to downgrade the nifi back to 1.9.2 with out rebooting the >> boxes. I kept all flow files and content as they were. Upon downgrading >> no loads were above 50 and this was only on the boxes that had the backlog >> that formed when we did the upgrade. The backlog on the 3 boxes worked off >> with no issue at all, and without me having to make changes to the flow. >> Once backlogs were worked off then our loads all sat around 20. >> >> This is a similar behavior from what we saw before, but just in another >> part of the flow. Has anyone else seen anything like this on 1.11.4? >> Unfortunately for now we can't upgrade due to this problem. Any thoughts >> from anyone would be greatly appreciated. >> >> Thanks, >> Robert >> >> On Fri, May 8, 2020 at 4:47 PM Robert R. Bruno <[email protected]> wrote: >> >>> Sorry for the delayed answer, but was doing some testing this week and >>> found a few more things out. >>> >>> First to answer some of your questions. >>> >>> I would say with no actual raw numbers, it was worse than a 10% >>> degradation. I say this since the flow was badly backing up, and a 10% >>> decrease in performance should not have caused this since normally we can >>> work off a backlog of data with no issues. I looked at my mergerecord >>> settings, and I am largely using size as the limiting factor. I have a max >>> size of 4MB and a max bin age of 1 minute followed by a second mergerecord >>> with a max size of 32MB and a max bin age of 5 minutes. >>> >>> I changed our flow a bit on a test system that was running 1.11.4, and >>> discovered the following: >>> >>> I changed mergerecords to mergecontents. I used pretty much all of the >>> same settings in the mergecontent but had the mergecontent deal with the >>> avro natively. In this flow, it currently seems like I don't need to chain >>> multiple mergecontents together like I did with mergerecords. >>> >>> I then fed the merged avro from the mergecontent to a convertrecord to >>> convert the data to parquet. The convertrecord was tremendously slower >>> than the mergecontent and become a bottleneck. I then switched the >>> convertrecord to the convertavrotoparquet processor. Convertavrotoparquet >>> can easily handle the output speed of the mergecontent and then some. >>> >>> My hope is to make these changes to our actual flow soon, and then >>> upgrade to 1.11.4 again. I'll let you know how that goes. >>> >>> Thanks, >>> Robert >>> >>> On Mon, Apr 27, 2020 at 9:26 AM Mark Payne <[email protected]> wrote: >>> >>>> Robert, >>>> >>>> What kind of performance degradation were you seeing here? I put >>>> together some simple flows to see if I could reproduce using 1.9.2 and >>>> current master. >>>> My flow consisted of GenerateFlowFile (generating 2 CSV rows per >>>> FlowFile) -> ConvertRecord (to Avro) -> MergeRecord (read Avro, write Avro) >>>> -> UpdateAttribute to try to mimic what you’ve got, given the details that >>>> I have. >>>> >>>> I did see a performance degradation on the order of about 10%. So on my >>>> laptop I went from processing 2.49 MM FlowFiles in 1.9.2 in 5 mins to 2.25 >>>> MM on the master branch. Interestingly, I saw no real change when I enabled >>>> Snappy compression. >>>> >>>> For a point of reference, I also tried removing MergeRecord and just >>>> Generate -> Convert -> UpdateAttribute. I saw the same roughly 10% >>>> performance degradation. >>>> >>>> I’m curious if you’re seeing more than that. If so, I think a template >>>> would be helpful to understand what’s different. >>>> >>>> Thanks >>>> -Mark >>>> >>>> >>>> On Apr 24, 2020, at 4:50 PM, Robert R. Bruno <[email protected]> wrote: >>>> >>>> Joe, >>>> >>>> In that part of the flow, we are using avro readers and writers. We >>>> are using snappy compression (which could be part of the problem). Since >>>> we are using avro at that point the embedded schema is being used by the >>>> reader and the writer is using the schema name property along with an >>>> internal schema registry in nifi. >>>> >>>> I can see what could potentially be shared. >>>> >>>> Thanks >>>> >>>> On Fri, Apr 24, 2020 at 4:41 PM Joe Witt <[email protected]> wrote: >>>> >>>>> Robert, >>>>> >>>>> Can you please detail the record readers and writers involved and how >>>>> schemas are accessed? There can be very important performance related >>>>> changes in the parsers/serializers of the given formats. And we've added >>>>> a >>>>> lot to make schema caching really capable but you have to opt into it. It >>>>> is of course possible MergeRecord itself is the culprit for performance >>>>> reduction but lets get a more full picture here. >>>>> >>>>> Are you able to share a template and sample data which we can use to >>>>> replicate? >>>>> >>>>> Thanks >>>>> >>>>> On Fri, Apr 24, 2020 at 4:38 PM Robert R. Bruno <[email protected]> >>>>> wrote: >>>>> >>>>>> I wanted to see if anyone else has experienced performance issues >>>>>> with the newest version of nifi and MergeRecord? We have been running on >>>>>> nifi 1.9.2 for awhile now, and recently upgraded to nifi 1.11.4. Once >>>>>> upgraded, our identical flows were no longer able to keep up with our >>>>>> data >>>>>> mainly at MergeRecord processors. >>>>>> >>>>>> We ended up downgrading back to nifi 1.9.2. Once we downgraded, all >>>>>> was keeping up again. There were no errors to speak of when we were >>>>>> running the flow with 1.11.4. We did see higher load on the OS, but this >>>>>> may have been caused by the fact there was such a tremendous backlog >>>>>> built >>>>>> up in the flow. >>>>>> >>>>>> Another side note, we saw one UpdateRecord processor producing errors >>>>>> when I tested the flow with nifi 1.11.4 with a small test flow. I was >>>>>> able >>>>>> to fix this issue by changing some parameters in my RecordWriter. So >>>>>> perhaps some underlying ways records are being handled since 1.9.2 caused >>>>>> the performance issue we saw? >>>>>> >>>>>> Any insight anyone has would be greatly appreciated, as we very much >>>>>> would like to upgrade to nifi 1.11.4. One thought was switching the >>>>>> MergeRecord processors to MergeContent since I've been told MergeContent >>>>>> seems to perform better, but not sure if this is actually true. We are >>>>>> using the pattern of chaining a few MergeRecord processors together to >>>>>> help >>>>>> with performance. >>>>>> >>>>>> Thanks in advance! >>>>>> >>>>> >>>> >
