Re: MergeRecord performance

Robert R. Bruno Mon, 01 Jun 2020 14:12:55 -0700

I have back pressure object threshold set to 100000 on that queue and my
swap threshold is 200000.  I don't think though when I had the issue the
number of flow files was very high in the queue in question since the issue
was now at updaterecord after I did a mergecontent that greatly reduced the
number of flow files.


On Mon, Jun 1, 2020, 16:02 Mark Payne <[email protected]> wrote:

> Hey Robert,
>
> How big are the FlowFile queues that you have in front of your
> MergeContent/MergeRecord processors? Or, more specifically, what do you
> have configured for the back pressure threshold? I ask because there was a
> fix in 1.11.0 [1] that had to do with ordering when swapping and ensuring
> that data remains in the same order after being swapped out and swapped
> back in when using the FIFO prioritizer.
>
> Some of the changes there can actually change the thresholds when we
> perform swapping. So I’m curious if you’re seeing a lot of swapping of
> FlowFiles to/from disk when running in 1.11.4 that you didn’t have in
> 1.9.2. Are you seeing logs about swapping occurring? And of note, when I
> talk about swapping, I’m talking about NiFi-level FlowFile swapping, not
> OS-level swapping.
>
> Thanks
> -Mark
>
> [1`] https://issues.apache.org/jira/browse/NIFI-7011
>
>
> On May 22, 2020, at 10:35 AM, Robert R. Bruno <[email protected]> wrote:
>
> Sorry one other thing I thought of that may help.  I noticed on 1.11.4
> when I would stop the updaterecord processor it would take a long period of
> time for the processor to stop (threads were hanging), but when I went back
> to 1.9.2 the processor would stop in a very timely manner.  Not sure if
> that helps, but just another data point.
>
> On Fri, May 22, 2020 at 9:22 AM Robert R. Bruno <[email protected]> wrote:
>
>> I had more updates on this.
>>
>> Yesterday I again attempted to upgrade one of our 1.9.2 clusters that is
>> now using mergecontent vs mergerecord.  The flow had been running on 1.9.2
>> for about a week with no issue.  I did the upgrade to 1.11.4, and saw about
>> 3 of 10 nodes not being able to keep up.  The load on these 3 nodes became
>> very high.  For perspective, a load of 80 is about as high as we like to
>> see these boxes, and some were getting as high as 120.  I saw one
>> bottleneck forming at an updaterecord.  I tried giving that processor a few
>> more threads to see if it would help work off the backlog.  No matter what
>> I tried (lowering thread, changing mergecontent sizes, etc) the load
>> wouldn't go down on those 3 boxes and they had either a slowing growing
>> backlog or would maintain the backlog they had.
>>
>> I then decide to downgrade the nifi back to 1.9.2 with out rebooting the
>> boxes.  I kept all flow files and content as they were.  Upon downgrading
>> no loads were above 50 and this was only on the boxes that had the backlog
>> that formed when we did the upgrade.  The backlog on the 3 boxes worked off
>> with no issue at all, and without me having to make changes to the flow.
>> Once backlogs were worked off then our loads all sat around 20.
>>
>> This is a similar behavior from what we saw before, but just in another
>> part of the flow.  Has anyone else seen anything like this on 1.11.4?
>> Unfortunately for now we can't upgrade due to this problem.  Any thoughts
>> from anyone would be greatly appreciated.
>>
>> Thanks,
>> Robert
>>
>> On Fri, May 8, 2020 at 4:47 PM Robert R. Bruno <[email protected]> wrote:
>>
>>> Sorry for the delayed answer, but was doing some testing this week and
>>> found a few more things out.
>>>
>>> First to answer some of your questions.
>>>
>>> I would say with no actual raw numbers, it was worse than a 10%
>>> degradation.  I say this since the flow was badly backing up, and a 10%
>>> decrease in performance should not have caused this since normally we can
>>> work off a backlog of data with no issues.  I looked at my mergerecord
>>> settings, and I am largely using size as the limiting factor.  I have a max
>>> size of 4MB and a max bin age of 1 minute followed by a second mergerecord
>>> with a max size of 32MB and a max bin age of 5 minutes.
>>>
>>> I changed our flow a bit on a test system that was running 1.11.4, and
>>> discovered the following:
>>>
>>> I changed mergerecords to mergecontents.  I used pretty much all of the
>>> same settings in the mergecontent but had the mergecontent deal with the
>>> avro natively.  In this flow, it currently seems like I don't need to chain
>>> multiple mergecontents together like I did with mergerecords.
>>>
>>> I then fed the merged avro from the mergecontent to a convertrecord to
>>> convert the data to parquet.  The convertrecord was tremendously slower
>>> than the mergecontent and become a bottleneck.  I then switched the
>>> convertrecord to the convertavrotoparquet processor.  Convertavrotoparquet
>>> can easily handle the output speed of the mergecontent and then some.
>>>
>>> My hope is to make these changes to our actual flow soon, and then
>>> upgrade to 1.11.4 again.  I'll let you know how that goes.
>>>
>>> Thanks,
>>> Robert
>>>
>>> On Mon, Apr 27, 2020 at 9:26 AM Mark Payne <[email protected]> wrote:
>>>
>>>> Robert,
>>>>
>>>> What kind of performance degradation were you seeing here? I put
>>>> together some simple flows to see if I could reproduce using 1.9.2 and
>>>> current master.
>>>> My flow consisted of GenerateFlowFile (generating 2 CSV rows per
>>>> FlowFile) -> ConvertRecord (to Avro) -> MergeRecord (read Avro, write Avro)
>>>> -> UpdateAttribute to try to mimic what you’ve got, given the details that
>>>> I have.
>>>>
>>>> I did see a performance degradation on the order of about 10%. So on my
>>>> laptop I went from processing 2.49 MM FlowFiles in 1.9.2 in 5 mins to 2.25
>>>> MM on the master branch. Interestingly, I saw no real change when I enabled
>>>> Snappy compression.
>>>>
>>>> For a point of reference, I also tried removing MergeRecord and just
>>>> Generate -> Convert -> UpdateAttribute. I saw the same roughly 10%
>>>> performance degradation.
>>>>
>>>> I’m curious if you’re seeing more than that. If so, I think a template
>>>> would be helpful to understand what’s different.
>>>>
>>>> Thanks
>>>> -Mark
>>>>
>>>>
>>>> On Apr 24, 2020, at 4:50 PM, Robert R. Bruno <[email protected]> wrote:
>>>>
>>>> Joe,
>>>>
>>>> In that part of the flow, we are using avro readers and writers.  We
>>>> are using snappy compression (which could be part of the problem).  Since
>>>> we are using avro at that point the embedded schema is being used by the
>>>> reader and the writer is using the schema name property along with an
>>>> internal schema registry in nifi.
>>>>
>>>> I can see what could potentially be shared.
>>>>
>>>> Thanks
>>>>
>>>> On Fri, Apr 24, 2020 at 4:41 PM Joe Witt <[email protected]> wrote:
>>>>
>>>>> Robert,
>>>>>
>>>>> Can you please detail the record readers and writers involved and how
>>>>> schemas are accessed?  There can be very important performance related
>>>>> changes in the parsers/serializers of the given formats.  And we've added 
>>>>> a
>>>>> lot to make schema caching really capable but you have to opt into it.  It
>>>>> is of course possible MergeRecord itself is the culprit for performance
>>>>> reduction but lets get a more full picture here.
>>>>>
>>>>> Are you able to share a template and sample data which we can use to
>>>>> replicate?
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Fri, Apr 24, 2020 at 4:38 PM Robert R. Bruno <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I wanted to see if anyone else has experienced performance issues
>>>>>> with the newest version of nifi and MergeRecord?  We have been running on
>>>>>> nifi 1.9.2 for awhile now, and recently upgraded to nifi 1.11.4.  Once
>>>>>> upgraded, our identical flows were no longer able to keep up with our 
>>>>>> data
>>>>>> mainly at MergeRecord processors.
>>>>>>
>>>>>> We ended up downgrading back to nifi 1.9.2.  Once we downgraded, all
>>>>>> was keeping up again.  There were no errors to speak of when we were
>>>>>> running the flow with 1.11.4.  We did see higher load on the OS, but this
>>>>>> may have been caused by the fact there was such a tremendous backlog 
>>>>>> built
>>>>>> up in the flow.
>>>>>>
>>>>>> Another side note, we saw one UpdateRecord processor producing errors
>>>>>> when I tested the flow with nifi 1.11.4 with a small test flow.  I was 
>>>>>> able
>>>>>> to fix this issue by changing some parameters in my RecordWriter.  So
>>>>>> perhaps some underlying ways records are being handled since 1.9.2 caused
>>>>>> the performance issue we saw?
>>>>>>
>>>>>> Any insight anyone has would be greatly appreciated, as we very much
>>>>>> would like to upgrade to nifi 1.11.4.  One thought was switching the
>>>>>> MergeRecord processors to MergeContent since I've been told MergeContent
>>>>>> seems to perform better, but not sure if this is actually true.  We are
>>>>>> using the pattern of chaining a few MergeRecord processors together to 
>>>>>> help
>>>>>> with performance.
>>>>>>
>>>>>> Thanks in advance!
>>>>>>
>>>>>
>>>>
>

Re: MergeRecord performance

Reply via email to