Sorry one other thing I thought of that may help.  I noticed on 1.11.4 when
I would stop the updaterecord processor it would take a long period of time
for the processor to stop (threads were hanging), but when I went back to
1.9.2 the processor would stop in a very timely manner.  Not sure if that
helps, but just another data point.

On Fri, May 22, 2020 at 9:22 AM Robert R. Bruno <[email protected]> wrote:

> I had more updates on this.
>
> Yesterday I again attempted to upgrade one of our 1.9.2 clusters that is
> now using mergecontent vs mergerecord.  The flow had been running on 1.9.2
> for about a week with no issue.  I did the upgrade to 1.11.4, and saw about
> 3 of 10 nodes not being able to keep up.  The load on these 3 nodes became
> very high.  For perspective, a load of 80 is about as high as we like to
> see these boxes, and some were getting as high as 120.  I saw one
> bottleneck forming at an updaterecord.  I tried giving that processor a few
> more threads to see if it would help work off the backlog.  No matter what
> I tried (lowering thread, changing mergecontent sizes, etc) the load
> wouldn't go down on those 3 boxes and they had either a slowing growing
> backlog or would maintain the backlog they had.
>
> I then decide to downgrade the nifi back to 1.9.2 with out rebooting the
> boxes.  I kept all flow files and content as they were.  Upon downgrading
> no loads were above 50 and this was only on the boxes that had the backlog
> that formed when we did the upgrade.  The backlog on the 3 boxes worked off
> with no issue at all, and without me having to make changes to the flow.
> Once backlogs were worked off then our loads all sat around 20.
>
> This is a similar behavior from what we saw before, but just in another
> part of the flow.  Has anyone else seen anything like this on 1.11.4?
> Unfortunately for now we can't upgrade due to this problem.  Any thoughts
> from anyone would be greatly appreciated.
>
> Thanks,
> Robert
>
> On Fri, May 8, 2020 at 4:47 PM Robert R. Bruno <[email protected]> wrote:
>
>> Sorry for the delayed answer, but was doing some testing this week and
>> found a few more things out.
>>
>> First to answer some of your questions.
>>
>> I would say with no actual raw numbers, it was worse than a 10%
>> degradation.  I say this since the flow was badly backing up, and a 10%
>> decrease in performance should not have caused this since normally we can
>> work off a backlog of data with no issues.  I looked at my mergerecord
>> settings, and I am largely using size as the limiting factor.  I have a max
>> size of 4MB and a max bin age of 1 minute followed by a second mergerecord
>> with a max size of 32MB and a max bin age of 5 minutes.
>>
>> I changed our flow a bit on a test system that was running 1.11.4, and
>> discovered the following:
>>
>> I changed mergerecords to mergecontents.  I used pretty much all of the
>> same settings in the mergecontent but had the mergecontent deal with the
>> avro natively.  In this flow, it currently seems like I don't need to chain
>> multiple mergecontents together like I did with mergerecords.
>>
>> I then fed the merged avro from the mergecontent to a convertrecord to
>> convert the data to parquet.  The convertrecord was tremendously slower
>> than the mergecontent and become a bottleneck.  I then switched the
>> convertrecord to the convertavrotoparquet processor.  Convertavrotoparquet
>> can easily handle the output speed of the mergecontent and then some.
>>
>> My hope is to make these changes to our actual flow soon, and then
>> upgrade to 1.11.4 again.  I'll let you know how that goes.
>>
>> Thanks,
>> Robert
>>
>> On Mon, Apr 27, 2020 at 9:26 AM Mark Payne <[email protected]> wrote:
>>
>>> Robert,
>>>
>>> What kind of performance degradation were you seeing here? I put
>>> together some simple flows to see if I could reproduce using 1.9.2 and
>>> current master.
>>> My flow consisted of GenerateFlowFile (generating 2 CSV rows per
>>> FlowFile) -> ConvertRecord (to Avro) -> MergeRecord (read Avro, write Avro)
>>> -> UpdateAttribute to try to mimic what you’ve got, given the details that
>>> I have.
>>>
>>> I did see a performance degradation on the order of about 10%. So on my
>>> laptop I went from processing 2.49 MM FlowFiles in 1.9.2 in 5 mins to 2.25
>>> MM on the master branch. Interestingly, I saw no real change when I enabled
>>> Snappy compression.
>>>
>>> For a point of reference, I also tried removing MergeRecord and just
>>> Generate -> Convert -> UpdateAttribute. I saw the same roughly 10%
>>> performance degradation.
>>>
>>> I’m curious if you’re seeing more than that. If so, I think a template
>>> would be helpful to understand what’s different.
>>>
>>> Thanks
>>> -Mark
>>>
>>>
>>> On Apr 24, 2020, at 4:50 PM, Robert R. Bruno <[email protected]> wrote:
>>>
>>> Joe,
>>>
>>> In that part of the flow, we are using avro readers and writers.  We are
>>> using snappy compression (which could be part of the problem).  Since we
>>> are using avro at that point the embedded schema is being used by the
>>> reader and the writer is using the schema name property along with an
>>> internal schema registry in nifi.
>>>
>>> I can see what could potentially be shared.
>>>
>>> Thanks
>>>
>>> On Fri, Apr 24, 2020 at 4:41 PM Joe Witt <[email protected]> wrote:
>>>
>>>> Robert,
>>>>
>>>> Can you please detail the record readers and writers involved and how
>>>> schemas are accessed?  There can be very important performance related
>>>> changes in the parsers/serializers of the given formats.  And we've added a
>>>> lot to make schema caching really capable but you have to opt into it.  It
>>>> is of course possible MergeRecord itself is the culprit for performance
>>>> reduction but lets get a more full picture here.
>>>>
>>>> Are you able to share a template and sample data which we can use to
>>>> replicate?
>>>>
>>>> Thanks
>>>>
>>>> On Fri, Apr 24, 2020 at 4:38 PM Robert R. Bruno <[email protected]>
>>>> wrote:
>>>>
>>>>> I wanted to see if anyone else has experienced performance issues with
>>>>> the newest version of nifi and MergeRecord?  We have been running on nifi
>>>>> 1.9.2 for awhile now, and recently upgraded to nifi 1.11.4.  Once 
>>>>> upgraded,
>>>>> our identical flows were no longer able to keep up with our data mainly at
>>>>> MergeRecord processors.
>>>>>
>>>>> We ended up downgrading back to nifi 1.9.2.  Once we downgraded, all
>>>>> was keeping up again.  There were no errors to speak of when we were
>>>>> running the flow with 1.11.4.  We did see higher load on the OS, but this
>>>>> may have been caused by the fact there was such a tremendous backlog built
>>>>> up in the flow.
>>>>>
>>>>> Another side note, we saw one UpdateRecord processor producing errors
>>>>> when I tested the flow with nifi 1.11.4 with a small test flow.  I was 
>>>>> able
>>>>> to fix this issue by changing some parameters in my RecordWriter.  So
>>>>> perhaps some underlying ways records are being handled since 1.9.2 caused
>>>>> the performance issue we saw?
>>>>>
>>>>> Any insight anyone has would be greatly appreciated, as we very much
>>>>> would like to upgrade to nifi 1.11.4.  One thought was switching the
>>>>> MergeRecord processors to MergeContent since I've been told MergeContent
>>>>> seems to perform better, but not sure if this is actually true.  We are
>>>>> using the pattern of chaining a few MergeRecord processors together to 
>>>>> help
>>>>> with performance.
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>
>>>

Reply via email to