Mark,

I asked this question a few week ago.

Here's the thread (Subject: Interaction between MergeContent parameters):

http://mail-archives.apache.org/mod_mbox/nifi-users/201510.mbox/%3CCACkT4wYfUS_9WFuK8YUy88AKKGTTrgvpNOFhxok_QUA_J%3DCtGg%40mail.gmail.com%3E

Cheers


On Fri, Nov 6, 2015 at 4:05 PM, Mark Petronic <[email protected]> wrote:
> I was expecting that, if is set min bin size to 128 mb and max to 512 mb and
> bin duration to 60s and max bins to 100 and if data was flowing quick enough
> so that I received more than 512 MB in 60 sec (all flow files are keyed to
> the same correlation variable in the case), that I would see output flow
> files of around the max of 512 mb. But that is not what I see. I played
> around with changing the max bins and duration but still don't seem to be
> able to "force" large files. Instead I see files around 100 -150 mb. Can
> someone point me to a more detailed description of how the binning logic
> works? Would like to understand the interplay between the number of bins,
> duration, and size when you have sets of flow files coming in that are
> linked to different correlation variables. In my case, if I process all my
> file types, I have about 19 different classes of data so there are 19
> different values for the correlation variable I use "StatClass". Why would
> one want many or few max bins? Does a larger value of duration will put more
> memory pressure on the JVM or are the bins accumulated as files on disk
> rather than in memory? I am trying to produce large files for HDFS storage
> from a stream of many smaller files.
>
> Thanks

Reply via email to