In that case you're looking to merge about 500,000 FlowFiles into a single 
FlowFile, so you'll
definitely want to use a cascading approach. I'd shoot for about 1 MB for the 
first MergeRecord
and then merge 128 of those together for the second MergeRecord.

The provenance backpressure is occurring because of the large number of 
provenance events being
generated. One even will be generated, more or less, for each time that a 
Processor touches a FlowFile.
So if you are merging the FlowFiles together as early as possible, you'll 
reduce the load that you're putting
on the Provenance Repository.

Also, depending on how you're getting the data into your flow, if you're able, 
it is best to receive a larger "micro-batch"
of records per flowfile to begin with and not split them up. This would greatly 
alleviate the pressure on the Provenance
Repository and avoid needing multiple MergeRecord processors as well.

Also, of note, there is a newer version of the Provenance Repository that you 
can switch to, by changing the
"nifi.provenance.repository.implementation" property in from 
to "org.apache.nifi.provenance.WriteAheadProvenanceRepository". The Write-Ahead 
version is quite a bit faster
and behaves differently than the Persistent Provenance Repo, so you won't see 
those warnings about provenance

I hope this helps!

> On Apr 13, 2018, at 10:30 AM, DEHAY Aurelien <> 
> wrote:
> Hello.
> It's me again regarding my mergerecord question.
> I still don't manage to have what I want, I may have understand how bin based 
> processor works, it's for clarification and a question regarding performance.
> I want to merge a huge number of 300 octets flowfiles in 128 MB parquet file. 
> My understanding is, for mergerecord to be able to create a bin with 128MB of 
> data, these data must be in queue. We can't feed the bin "one flow at a 
> time", so working with small flowfiles, I have to set the backpressure 
> parameter to something really high, or remove completely the number of 
> flowfile backpressure limit.
> I understood by reading 
>  that it's not the "good" way to do, but I should cascade multiple merge to 
> "slowly" make the flowfile bigger?
> I've made some test with a single level but I hit the "provenance recording 
> rate". Will multiple level help?
> Thanks for any help.
> Aurélien.
> This electronic transmission (and any attachments thereto) is intended solely 
> for the use of the addressee(s). It may contain confidential or legally 
> privileged information. If you are not the intended recipient of this 
> message, you must delete it immediately and notify the sender. Any 
> unauthorized use or disclosure of this message is strictly prohibited.  
> Faurecia does not guarantee the integrity of this transmission and shall 
> therefore never be liable if the message is altered or falsified nor for any 
> virus, interception or damage to your system.

Reply via email to