Hi David,

With Merge Strategy set to bin-packing algorithm and no Correlation
Attribute Name set, then you are most likely using just 1 bin already.  You
didn't say if you had a Max Bin Age set.  If you do, then MergeContent
could definitely produce an output file of a few MB if you don't have a lot
of files flowing through in that time period.

-- Mike


On Thu, Oct 22, 2015 at 3:06 PM, David Klim <[email protected]> wrote:

> Hello Mike,
>
> So if the merge strategy is binary concatenation, no correlation
> attribute, and minimum group size is 250MB, max number of bins is 100 and
> min number of entries is 10000, why I am still getting files of a few MB?
>
> Maybe because of the bins? Can I set it so that there is only a single bin
> for them all?
>
> Thanks
>
>
>
> ------------------------------
> Date: Wed, 21 Oct 2015 17:53:50 -0400
> Subject: Re: Interaction between MergeContent parameters
> From: [email protected]
> To: [email protected]
>
>
> Hello Andre,
>
> The MergeContent processor certainly can be challenging to understand its
> inner workings.  If you are running into the nifi.queue.swap.threshold
> limit of MergeContent as described in NIFI-697, then you should increase
> that value in the nifi.properties file and restart your NiFi process.  A
> multiple of 10000 is recommended.  You will also likely have to increase
> your Java memory settings in bootstrap.conf.
>
> MergeContent works like this:
>
> When a FlowFile arrives at MergeContent, it is assigned to a bin based on
> Merge Strategy and Correlation Attribute Name.  Maximum Number of Bins
> controls resource usage such that if all bins have FlowFiles in them and
> another FlowFile arrives that doesn't fit into one of those bins, then the
> oldest bin is automatically marked as complete, and the new FlowFile starts
> its own new bin.
>
> A bin will be complete once (number of files in bin) >= Minimum Number of
> Entries AND (number of bytes in bin) >= Minimum Group Size OR the bin has
> existed for Max Bin Age.  Then the FlowFiles in the bin are merged and sent
> to an output relationship.
>
> The Maximum Number of Entries and Maximum Group Size can prevent bins from
> becoming "over full".  For example, when Maximum Group Size is 1 GB and a
> bin currently has 900 MB in it, then a flowfile arrives that is 200 MB in
> size, the 200 MB FlowFile will not make that bin "over full" but instead
> will get a bin all to itself.
>
> Hope this helps,
> -- Mike
>
>
>
> On Wed, Oct 21, 2015 at 5:20 PM, Andre <[email protected]> wrote:
>
> hi there,
>
> Would anyone be able to describe how the different MergeContents work
> together?
>
> I've set up a merger with settings:
>
> Minimum Number of Entries: 1
> Maximum Number of Entries: 200000 (I also tried it unset and the same
> thing happened)
> Minimum Group Size = 128 MB
> Maximum Group Size = 1GB
> Max Bin Age = 120 sec
> Maximum number of Bins = 100
>
>
> And yet, every time the files are saved into disk they have 17MiB and
> sharp 20000 lines, a number which according to NIFI-697 comes from
> nifi.queue.swap.threshold  ?
>
> If I could describe what I want, it would be to:
>
> Merge individual flows to a minimum of 128MB and up to 1GB
> unless
> there's no further data arriving within a 120 sec window
>
>
> I thank you in advance
>
>
>

Reply via email to