Sorry for the delayed answer, but was doing some testing this week and
found a few more things out.

First to answer some of your questions.

I would say with no actual raw numbers, it was worse than a 10%
degradation.  I say this since the flow was badly backing up, and a 10%
decrease in performance should not have caused this since normally we can
work off a backlog of data with no issues.  I looked at my mergerecord
settings, and I am largely using size as the limiting factor.  I have a max
size of 4MB and a max bin age of 1 minute followed by a second mergerecord
with a max size of 32MB and a max bin age of 5 minutes.

I changed our flow a bit on a test system that was running 1.11.4, and
discovered the following:

I changed mergerecords to mergecontents.  I used pretty much all of the
same settings in the mergecontent but had the mergecontent deal with the
avro natively.  In this flow, it currently seems like I don't need to chain
multiple mergecontents together like I did with mergerecords.

I then fed the merged avro from the mergecontent to a convertrecord to
convert the data to parquet.  The convertrecord was tremendously slower
than the mergecontent and become a bottleneck.  I then switched the
convertrecord to the convertavrotoparquet processor.  Convertavrotoparquet
can easily handle the output speed of the mergecontent and then some.

My hope is to make these changes to our actual flow soon, and then upgrade
to 1.11.4 again.  I'll let you know how that goes.

Thanks,
Robert

On Mon, Apr 27, 2020 at 9:26 AM Mark Payne <[email protected]> wrote:

> Robert,
>
> What kind of performance degradation were you seeing here? I put together
> some simple flows to see if I could reproduce using 1.9.2 and current
> master.
> My flow consisted of GenerateFlowFile (generating 2 CSV rows per FlowFile)
> -> ConvertRecord (to Avro) -> MergeRecord (read Avro, write Avro) ->
> UpdateAttribute to try to mimic what you’ve got, given the details that I
> have.
>
> I did see a performance degradation on the order of about 10%. So on my
> laptop I went from processing 2.49 MM FlowFiles in 1.9.2 in 5 mins to 2.25
> MM on the master branch. Interestingly, I saw no real change when I enabled
> Snappy compression.
>
> For a point of reference, I also tried removing MergeRecord and just
> Generate -> Convert -> UpdateAttribute. I saw the same roughly 10%
> performance degradation.
>
> I’m curious if you’re seeing more than that. If so, I think a template
> would be helpful to understand what’s different.
>
> Thanks
> -Mark
>
>
> On Apr 24, 2020, at 4:50 PM, Robert R. Bruno <[email protected]> wrote:
>
> Joe,
>
> In that part of the flow, we are using avro readers and writers.  We are
> using snappy compression (which could be part of the problem).  Since we
> are using avro at that point the embedded schema is being used by the
> reader and the writer is using the schema name property along with an
> internal schema registry in nifi.
>
> I can see what could potentially be shared.
>
> Thanks
>
> On Fri, Apr 24, 2020 at 4:41 PM Joe Witt <[email protected]> wrote:
>
>> Robert,
>>
>> Can you please detail the record readers and writers involved and how
>> schemas are accessed?  There can be very important performance related
>> changes in the parsers/serializers of the given formats.  And we've added a
>> lot to make schema caching really capable but you have to opt into it.  It
>> is of course possible MergeRecord itself is the culprit for performance
>> reduction but lets get a more full picture here.
>>
>> Are you able to share a template and sample data which we can use to
>> replicate?
>>
>> Thanks
>>
>> On Fri, Apr 24, 2020 at 4:38 PM Robert R. Bruno <[email protected]>
>> wrote:
>>
>>> I wanted to see if anyone else has experienced performance issues with
>>> the newest version of nifi and MergeRecord?  We have been running on nifi
>>> 1.9.2 for awhile now, and recently upgraded to nifi 1.11.4.  Once upgraded,
>>> our identical flows were no longer able to keep up with our data mainly at
>>> MergeRecord processors.
>>>
>>> We ended up downgrading back to nifi 1.9.2.  Once we downgraded, all was
>>> keeping up again.  There were no errors to speak of when we were running
>>> the flow with 1.11.4.  We did see higher load on the OS, but this may have
>>> been caused by the fact there was such a tremendous backlog built up in the
>>> flow.
>>>
>>> Another side note, we saw one UpdateRecord processor producing errors
>>> when I tested the flow with nifi 1.11.4 with a small test flow.  I was able
>>> to fix this issue by changing some parameters in my RecordWriter.  So
>>> perhaps some underlying ways records are being handled since 1.9.2 caused
>>> the performance issue we saw?
>>>
>>> Any insight anyone has would be greatly appreciated, as we very much
>>> would like to upgrade to nifi 1.11.4.  One thought was switching the
>>> MergeRecord processors to MergeContent since I've been told MergeContent
>>> seems to perform better, but not sure if this is actually true.  We are
>>> using the pattern of chaining a few MergeRecord processors together to help
>>> with performance.
>>>
>>> Thanks in advance!
>>>
>>
>

Reply via email to