Hello Andre, The MergeContent processor certainly can be challenging to understand its inner workings. If you are running into the nifi.queue.swap.threshold limit of MergeContent as described in NIFI-697, then you should increase that value in the nifi.properties file and restart your NiFi process. A multiple of 10000 is recommended. You will also likely have to increase your Java memory settings in bootstrap.conf.
MergeContent works like this: When a FlowFile arrives at MergeContent, it is assigned to a bin based on Merge Strategy and Correlation Attribute Name. Maximum Number of Bins controls resource usage such that if all bins have FlowFiles in them and another FlowFile arrives that doesn't fit into one of those bins, then the oldest bin is automatically marked as complete, and the new FlowFile starts its own new bin. A bin will be complete once (number of files in bin) >= Minimum Number of Entries AND (number of bytes in bin) >= Minimum Group Size OR the bin has existed for Max Bin Age. Then the FlowFiles in the bin are merged and sent to an output relationship. The Maximum Number of Entries and Maximum Group Size can prevent bins from becoming "over full". For example, when Maximum Group Size is 1 GB and a bin currently has 900 MB in it, then a flowfile arrives that is 200 MB in size, the 200 MB FlowFile will not make that bin "over full" but instead will get a bin all to itself. Hope this helps, -- Mike On Wed, Oct 21, 2015 at 5:20 PM, Andre <[email protected]> wrote: > hi there, > > Would anyone be able to describe how the different MergeContents work > together? > > I've set up a merger with settings: > > Minimum Number of Entries: 1 > Maximum Number of Entries: 200000 (I also tried it unset and the same > thing happened) > Minimum Group Size = 128 MB > Maximum Group Size = 1GB > Max Bin Age = 120 sec > Maximum number of Bins = 100 > > > And yet, every time the files are saved into disk they have 17MiB and > sharp 20000 lines, a number which according to NIFI-697 comes from > nifi.queue.swap.threshold ? > > If I could describe what I want, it would be to: > > Merge individual flows to a minimum of 128MB and up to 1GB > unless > there's no further data arriving within a 120 sec window > > > I thank you in advance >
