Hi David, With Merge Strategy set to bin-packing algorithm and no Correlation Attribute Name set, then you are most likely using just 1 bin already. You didn't say if you had a Max Bin Age set. If you do, then MergeContent could definitely produce an output file of a few MB if you don't have a lot of files flowing through in that time period.
-- Mike On Thu, Oct 22, 2015 at 3:06 PM, David Klim <[email protected]> wrote: > Hello Mike, > > So if the merge strategy is binary concatenation, no correlation > attribute, and minimum group size is 250MB, max number of bins is 100 and > min number of entries is 10000, why I am still getting files of a few MB? > > Maybe because of the bins? Can I set it so that there is only a single bin > for them all? > > Thanks > > > > ------------------------------ > Date: Wed, 21 Oct 2015 17:53:50 -0400 > Subject: Re: Interaction between MergeContent parameters > From: [email protected] > To: [email protected] > > > Hello Andre, > > The MergeContent processor certainly can be challenging to understand its > inner workings. If you are running into the nifi.queue.swap.threshold > limit of MergeContent as described in NIFI-697, then you should increase > that value in the nifi.properties file and restart your NiFi process. A > multiple of 10000 is recommended. You will also likely have to increase > your Java memory settings in bootstrap.conf. > > MergeContent works like this: > > When a FlowFile arrives at MergeContent, it is assigned to a bin based on > Merge Strategy and Correlation Attribute Name. Maximum Number of Bins > controls resource usage such that if all bins have FlowFiles in them and > another FlowFile arrives that doesn't fit into one of those bins, then the > oldest bin is automatically marked as complete, and the new FlowFile starts > its own new bin. > > A bin will be complete once (number of files in bin) >= Minimum Number of > Entries AND (number of bytes in bin) >= Minimum Group Size OR the bin has > existed for Max Bin Age. Then the FlowFiles in the bin are merged and sent > to an output relationship. > > The Maximum Number of Entries and Maximum Group Size can prevent bins from > becoming "over full". For example, when Maximum Group Size is 1 GB and a > bin currently has 900 MB in it, then a flowfile arrives that is 200 MB in > size, the 200 MB FlowFile will not make that bin "over full" but instead > will get a bin all to itself. > > Hope this helps, > -- Mike > > > > On Wed, Oct 21, 2015 at 5:20 PM, Andre <[email protected]> wrote: > > hi there, > > Would anyone be able to describe how the different MergeContents work > together? > > I've set up a merger with settings: > > Minimum Number of Entries: 1 > Maximum Number of Entries: 200000 (I also tried it unset and the same > thing happened) > Minimum Group Size = 128 MB > Maximum Group Size = 1GB > Max Bin Age = 120 sec > Maximum number of Bins = 100 > > > And yet, every time the files are saved into disk they have 17MiB and > sharp 20000 lines, a number which according to NIFI-697 comes from > nifi.queue.swap.threshold ? > > If I could describe what I want, it would be to: > > Merge individual flows to a minimum of 128MB and up to 1GB > unless > there's no further data arriving within a 120 sec window > > > I thank you in advance > > >
