Robert,

What kind of performance degradation were you seeing here? I put together some 
simple flows to see if I could reproduce using 1.9.2 and current master.
My flow consisted of GenerateFlowFile (generating 2 CSV rows per FlowFile) -> 
ConvertRecord (to Avro) -> MergeRecord (read Avro, write Avro) -> 
UpdateAttribute to try to mimic what you’ve got, given the details that I have.

I did see a performance degradation on the order of about 10%. So on my laptop 
I went from processing 2.49 MM FlowFiles in 1.9.2 in 5 mins to 2.25 MM on the 
master branch. Interestingly, I saw no real change when I enabled Snappy 
compression.

For a point of reference, I also tried removing MergeRecord and just Generate 
-> Convert -> UpdateAttribute. I saw the same roughly 10% performance 
degradation.

I’m curious if you’re seeing more than that. If so, I think a template would be 
helpful to understand what’s different.

Thanks
-Mark


On Apr 24, 2020, at 4:50 PM, Robert R. Bruno 
<[email protected]<mailto:[email protected]>> wrote:

Joe,

In that part of the flow, we are using avro readers and writers.  We are using 
snappy compression (which could be part of the problem).  Since we are using 
avro at that point the embedded schema is being used by the reader and the 
writer is using the schema name property along with an internal schema registry 
in nifi.

I can see what could potentially be shared.

Thanks

On Fri, Apr 24, 2020 at 4:41 PM Joe Witt 
<[email protected]<mailto:[email protected]>> wrote:
Robert,

Can you please detail the record readers and writers involved and how schemas 
are accessed?  There can be very important performance related changes in the 
parsers/serializers of the given formats.  And we've added a lot to make schema 
caching really capable but you have to opt into it.  It is of course possible 
MergeRecord itself is the culprit for performance reduction but lets get a more 
full picture here.

Are you able to share a template and sample data which we can use to replicate?

Thanks

On Fri, Apr 24, 2020 at 4:38 PM Robert R. Bruno 
<[email protected]<mailto:[email protected]>> wrote:
I wanted to see if anyone else has experienced performance issues with the 
newest version of nifi and MergeRecord?  We have been running on nifi 1.9.2 for 
awhile now, and recently upgraded to nifi 1.11.4.  Once upgraded, our identical 
flows were no longer able to keep up with our data mainly at MergeRecord 
processors.

We ended up downgrading back to nifi 1.9.2.  Once we downgraded, all was 
keeping up again.  There were no errors to speak of when we were running the 
flow with 1.11.4.  We did see higher load on the OS, but this may have been 
caused by the fact there was such a tremendous backlog built up in the flow.

Another side note, we saw one UpdateRecord processor producing errors when I 
tested the flow with nifi 1.11.4 with a small test flow.  I was able to fix 
this issue by changing some parameters in my RecordWriter.  So perhaps some 
underlying ways records are being handled since 1.9.2 caused the performance 
issue we saw?

Any insight anyone has would be greatly appreciated, as we very much would like 
to upgrade to nifi 1.11.4.  One thought was switching the MergeRecord 
processors to MergeContent since I've been told MergeContent seems to perform 
better, but not sure if this is actually true.  We are using the pattern of 
chaining a few MergeRecord processors together to help with performance.

Thanks in advance!

Reply via email to