Sorry one other thing I thought of that may help. I noticed on 1.11.4 when I would stop the updaterecord processor it would take a long period of time for the processor to stop (threads were hanging), but when I went back to 1.9.2 the processor would stop in a very timely manner. Not sure if that helps, but just another data point.
On Fri, May 22, 2020 at 9:22 AM Robert R. Bruno <[email protected]> wrote: > I had more updates on this. > > Yesterday I again attempted to upgrade one of our 1.9.2 clusters that is > now using mergecontent vs mergerecord. The flow had been running on 1.9.2 > for about a week with no issue. I did the upgrade to 1.11.4, and saw about > 3 of 10 nodes not being able to keep up. The load on these 3 nodes became > very high. For perspective, a load of 80 is about as high as we like to > see these boxes, and some were getting as high as 120. I saw one > bottleneck forming at an updaterecord. I tried giving that processor a few > more threads to see if it would help work off the backlog. No matter what > I tried (lowering thread, changing mergecontent sizes, etc) the load > wouldn't go down on those 3 boxes and they had either a slowing growing > backlog or would maintain the backlog they had. > > I then decide to downgrade the nifi back to 1.9.2 with out rebooting the > boxes. I kept all flow files and content as they were. Upon downgrading > no loads were above 50 and this was only on the boxes that had the backlog > that formed when we did the upgrade. The backlog on the 3 boxes worked off > with no issue at all, and without me having to make changes to the flow. > Once backlogs were worked off then our loads all sat around 20. > > This is a similar behavior from what we saw before, but just in another > part of the flow. Has anyone else seen anything like this on 1.11.4? > Unfortunately for now we can't upgrade due to this problem. Any thoughts > from anyone would be greatly appreciated. > > Thanks, > Robert > > On Fri, May 8, 2020 at 4:47 PM Robert R. Bruno <[email protected]> wrote: > >> Sorry for the delayed answer, but was doing some testing this week and >> found a few more things out. >> >> First to answer some of your questions. >> >> I would say with no actual raw numbers, it was worse than a 10% >> degradation. I say this since the flow was badly backing up, and a 10% >> decrease in performance should not have caused this since normally we can >> work off a backlog of data with no issues. I looked at my mergerecord >> settings, and I am largely using size as the limiting factor. I have a max >> size of 4MB and a max bin age of 1 minute followed by a second mergerecord >> with a max size of 32MB and a max bin age of 5 minutes. >> >> I changed our flow a bit on a test system that was running 1.11.4, and >> discovered the following: >> >> I changed mergerecords to mergecontents. I used pretty much all of the >> same settings in the mergecontent but had the mergecontent deal with the >> avro natively. In this flow, it currently seems like I don't need to chain >> multiple mergecontents together like I did with mergerecords. >> >> I then fed the merged avro from the mergecontent to a convertrecord to >> convert the data to parquet. The convertrecord was tremendously slower >> than the mergecontent and become a bottleneck. I then switched the >> convertrecord to the convertavrotoparquet processor. Convertavrotoparquet >> can easily handle the output speed of the mergecontent and then some. >> >> My hope is to make these changes to our actual flow soon, and then >> upgrade to 1.11.4 again. I'll let you know how that goes. >> >> Thanks, >> Robert >> >> On Mon, Apr 27, 2020 at 9:26 AM Mark Payne <[email protected]> wrote: >> >>> Robert, >>> >>> What kind of performance degradation were you seeing here? I put >>> together some simple flows to see if I could reproduce using 1.9.2 and >>> current master. >>> My flow consisted of GenerateFlowFile (generating 2 CSV rows per >>> FlowFile) -> ConvertRecord (to Avro) -> MergeRecord (read Avro, write Avro) >>> -> UpdateAttribute to try to mimic what you’ve got, given the details that >>> I have. >>> >>> I did see a performance degradation on the order of about 10%. So on my >>> laptop I went from processing 2.49 MM FlowFiles in 1.9.2 in 5 mins to 2.25 >>> MM on the master branch. Interestingly, I saw no real change when I enabled >>> Snappy compression. >>> >>> For a point of reference, I also tried removing MergeRecord and just >>> Generate -> Convert -> UpdateAttribute. I saw the same roughly 10% >>> performance degradation. >>> >>> I’m curious if you’re seeing more than that. If so, I think a template >>> would be helpful to understand what’s different. >>> >>> Thanks >>> -Mark >>> >>> >>> On Apr 24, 2020, at 4:50 PM, Robert R. Bruno <[email protected]> wrote: >>> >>> Joe, >>> >>> In that part of the flow, we are using avro readers and writers. We are >>> using snappy compression (which could be part of the problem). Since we >>> are using avro at that point the embedded schema is being used by the >>> reader and the writer is using the schema name property along with an >>> internal schema registry in nifi. >>> >>> I can see what could potentially be shared. >>> >>> Thanks >>> >>> On Fri, Apr 24, 2020 at 4:41 PM Joe Witt <[email protected]> wrote: >>> >>>> Robert, >>>> >>>> Can you please detail the record readers and writers involved and how >>>> schemas are accessed? There can be very important performance related >>>> changes in the parsers/serializers of the given formats. And we've added a >>>> lot to make schema caching really capable but you have to opt into it. It >>>> is of course possible MergeRecord itself is the culprit for performance >>>> reduction but lets get a more full picture here. >>>> >>>> Are you able to share a template and sample data which we can use to >>>> replicate? >>>> >>>> Thanks >>>> >>>> On Fri, Apr 24, 2020 at 4:38 PM Robert R. Bruno <[email protected]> >>>> wrote: >>>> >>>>> I wanted to see if anyone else has experienced performance issues with >>>>> the newest version of nifi and MergeRecord? We have been running on nifi >>>>> 1.9.2 for awhile now, and recently upgraded to nifi 1.11.4. Once >>>>> upgraded, >>>>> our identical flows were no longer able to keep up with our data mainly at >>>>> MergeRecord processors. >>>>> >>>>> We ended up downgrading back to nifi 1.9.2. Once we downgraded, all >>>>> was keeping up again. There were no errors to speak of when we were >>>>> running the flow with 1.11.4. We did see higher load on the OS, but this >>>>> may have been caused by the fact there was such a tremendous backlog built >>>>> up in the flow. >>>>> >>>>> Another side note, we saw one UpdateRecord processor producing errors >>>>> when I tested the flow with nifi 1.11.4 with a small test flow. I was >>>>> able >>>>> to fix this issue by changing some parameters in my RecordWriter. So >>>>> perhaps some underlying ways records are being handled since 1.9.2 caused >>>>> the performance issue we saw? >>>>> >>>>> Any insight anyone has would be greatly appreciated, as we very much >>>>> would like to upgrade to nifi 1.11.4. One thought was switching the >>>>> MergeRecord processors to MergeContent since I've been told MergeContent >>>>> seems to perform better, but not sure if this is actually true. We are >>>>> using the pattern of chaining a few MergeRecord processors together to >>>>> help >>>>> with performance. >>>>> >>>>> Thanks in advance! >>>>> >>>> >>>
