Good afternoon,

Another thing to help you out maybe ...

You can also tweak the setting:

This setting will control the value of the max flowfile count on a
connection if exceeded it will flush those flowfiles to disk.

I am not sure however there is a distinction between a "record flowfile"
since they live in memory, and the traditional way of thinking of flowfiles.

On Fri, Apr 13, 2018 at 10:49 AM Mark Payne <> wrote:

> Aurélien,
> In that case you're looking to merge about 500,000 FlowFiles into a single
> FlowFile, so you'll
> definitely want to use a cascading approach. I'd shoot for about 1 MB for
> the first MergeRecord
> and then merge 128 of those together for the second MergeRecord.
> The provenance backpressure is occurring because of the large number of
> provenance events being
> generated. One even will be generated, more or less, for each time that a
> Processor touches a FlowFile.
> So if you are merging the FlowFiles together as early as possible, you'll
> reduce the load that you're putting
> on the Provenance Repository.
> Also, depending on how you're getting the data into your flow, if you're
> able, it is best to receive a larger "micro-batch"
> of records per flowfile to begin with and not split them up. This would
> greatly alleviate the pressure on the Provenance
> Repository and avoid needing multiple MergeRecord processors as well.
> Also, of note, there is a newer version of the Provenance Repository that
> you can switch to, by changing the
> "nifi.provenance.repository.implementation" property in
> from "org.apache.nifi.provenance.PersistentProvenanceRepository"
> to "org.apache.nifi.provenance.WriteAheadProvenanceRepository". The
> Write-Ahead version is quite a bit faster
> and behaves differently than the Persistent Provenance Repo, so you won't
> see those warnings about provenance
> backpressure.
> I hope this helps!
> -Mark
> > On Apr 13, 2018, at 10:30 AM, DEHAY Aurelien <
>> wrote:
> >
> > Hello.
> >
> > It's me again regarding my mergerecord question.
> >
> > I still don't manage to have what I want, I may have understand how bin
> based processor works, it's for clarification and a question regarding
> performance.
> >
> > I want to merge a huge number of 300 octets flowfiles in 128 MB parquet
> file.
> >
> > My understanding is, for mergerecord to be able to create a bin with
> 128MB of data, these data must be in queue. We can't feed the bin "one flow
> at a time", so working with small flowfiles, I have to set the backpressure
> parameter to something really high, or remove completely the number of
> flowfile backpressure limit.
> >
> > I understood by reading
> that it's not the "good" way to do, but I should cascade multiple merge to
> "slowly" make the flowfile bigger?
> >
> > I've made some test with a single level but I hit the "provenance
> recording rate". Will multiple level help?
> >
> > Thanks for any help.
> >
> > Aurélien.
> >
> > This electronic transmission (and any attachments thereto) is intended
> solely for the use of the addressee(s). It may contain confidential or
> legally privileged information. If you are not the intended recipient of
> this message, you must delete it immediately and notify the sender. Any
> unauthorized use or disclosure of this message is strictly prohibited.
> Faurecia does not guarantee the integrity of this transmission and shall
> therefore never be liable if the message is altered or falsified nor for
> any virus, interception or damage to your system.
> >

Reply via email to