Hello Andre,

The MergeContent processor certainly can be challenging to understand its
inner workings.  If you are running into the nifi.queue.swap.threshold
limit of MergeContent as described in NIFI-697, then you should increase
that value in the nifi.properties file and restart your NiFi process.  A
multiple of 10000 is recommended.  You will also likely have to increase
your Java memory settings in bootstrap.conf.

MergeContent works like this:

When a FlowFile arrives at MergeContent, it is assigned to a bin based on
Merge Strategy and Correlation Attribute Name.  Maximum Number of Bins
controls resource usage such that if all bins have FlowFiles in them and
another FlowFile arrives that doesn't fit into one of those bins, then the
oldest bin is automatically marked as complete, and the new FlowFile starts
its own new bin.

A bin will be complete once (number of files in bin) >= Minimum Number of
Entries AND (number of bytes in bin) >= Minimum Group Size OR the bin has
existed for Max Bin Age.  Then the FlowFiles in the bin are merged and sent
to an output relationship.

The Maximum Number of Entries and Maximum Group Size can prevent bins from
becoming "over full".  For example, when Maximum Group Size is 1 GB and a
bin currently has 900 MB in it, then a flowfile arrives that is 200 MB in
size, the 200 MB FlowFile will not make that bin "over full" but instead
will get a bin all to itself.

Hope this helps,
-- Mike



On Wed, Oct 21, 2015 at 5:20 PM, Andre <[email protected]> wrote:

> hi there,
>
> Would anyone be able to describe how the different MergeContents work
> together?
>
> I've set up a merger with settings:
>
> Minimum Number of Entries: 1
> Maximum Number of Entries: 200000 (I also tried it unset and the same
> thing happened)
> Minimum Group Size = 128 MB
> Maximum Group Size = 1GB
> Max Bin Age = 120 sec
> Maximum number of Bins = 100
>
>
> And yet, every time the files are saved into disk they have 17MiB and
> sharp 20000 lines, a number which according to NIFI-697 comes from
> nifi.queue.swap.threshold  ?
>
> If I could describe what I want, it would be to:
>
> Merge individual flows to a minimum of 128MB and up to 1GB
> unless
> there's no further data arriving within a 120 sec window
>
>
> I thank you in advance
>

Reply via email to