Hello Mike,
So if the merge strategy is binary concatenation, no correlation attribute, and 
minimum group size is 250MB, max number of bins is 100 and min number of 
entries is 10000, why I am still getting files of a few MB? 
Maybe because of the bins? Can I set it so that there is only a single bin for 
them all?
Thanks


Date: Wed, 21 Oct 2015 17:53:50 -0400
Subject: Re: Interaction between MergeContent parameters
From: [email protected]
To: [email protected]

Hello Andre,

The MergeContent processor certainly can be challenging to understand its inner 
workings.  If you are running into the nifi.queue.swap.threshold limit of 
MergeContent as described in NIFI-697, then you should increase that value in 
the nifi.properties file and restart your NiFi process.  A multiple of 10000 is 
recommended.  You will also likely have to increase your Java memory settings 
in bootstrap.conf.

MergeContent works like this:

When a FlowFile arrives at MergeContent, it is assigned to a bin based on Merge 
Strategy and Correlation Attribute Name.  Maximum Number of Bins controls 
resource usage such that if all bins have FlowFiles in them and another 
FlowFile arrives that doesn't fit into one of those bins, then the oldest bin 
is automatically marked as complete, and the new FlowFile starts its own new 
bin.

A bin will be complete once (number of files in bin) >= Minimum Number of 
Entries AND (number of bytes in bin) >= Minimum Group Size OR the bin has 
existed for Max Bin Age.  Then the FlowFiles in the bin are merged and sent to 
an output relationship.

The Maximum Number of Entries and Maximum Group Size can prevent bins from 
becoming "over full".  For example, when Maximum Group Size is 1 GB and a bin 
currently has 900 MB in it, then a flowfile arrives that is 200 MB in size, the 
200 MB FlowFile will not make that bin "over full" but instead will get a bin 
all to itself.

Hope this helps,
-- Mike



On Wed, Oct 21, 2015 at 5:20 PM, Andre <[email protected]> wrote:
hi there,



Would anyone be able to describe how the different MergeContents work together?



I've set up a merger with settings:



Minimum Number of Entries: 1

Maximum Number of Entries: 200000 (I also tried it unset and the same

thing happened)

Minimum Group Size = 128 MB

Maximum Group Size = 1GB

Max Bin Age = 120 sec

Maximum number of Bins = 100





And yet, every time the files are saved into disk they have 17MiB and

sharp 20000 lines, a number which according to NIFI-697 comes from

nifi.queue.swap.threshold  ?



If I could describe what I want, it would be to:



Merge individual flows to a minimum of 128MB and up to 1GB

unless

there's no further data arriving within a 120 sec window





I thank you in advance


                                          

Reply via email to