Mark, I asked this question a few week ago.
Here's the thread (Subject: Interaction between MergeContent parameters): http://mail-archives.apache.org/mod_mbox/nifi-users/201510.mbox/%3CCACkT4wYfUS_9WFuK8YUy88AKKGTTrgvpNOFhxok_QUA_J%3DCtGg%40mail.gmail.com%3E Cheers On Fri, Nov 6, 2015 at 4:05 PM, Mark Petronic <[email protected]> wrote: > I was expecting that, if is set min bin size to 128 mb and max to 512 mb and > bin duration to 60s and max bins to 100 and if data was flowing quick enough > so that I received more than 512 MB in 60 sec (all flow files are keyed to > the same correlation variable in the case), that I would see output flow > files of around the max of 512 mb. But that is not what I see. I played > around with changing the max bins and duration but still don't seem to be > able to "force" large files. Instead I see files around 100 -150 mb. Can > someone point me to a more detailed description of how the binning logic > works? Would like to understand the interplay between the number of bins, > duration, and size when you have sets of flow files coming in that are > linked to different correlation variables. In my case, if I process all my > file types, I have about 19 different classes of data so there are 19 > different values for the correlation variable I use "StatClass". Why would > one want many or few max bins? Does a larger value of duration will put more > memory pressure on the JVM or are the bins accumulated as files on disk > rather than in memory? I am trying to produce large files for HDFS storage > from a stream of many smaller files. > > Thanks
