I was expecting that, if is set min bin size to 128 mb and max to 512 mb and bin duration to 60s and max bins to 100 and if data was flowing quick enough so that I received more than 512 MB in 60 sec (all flow files are keyed to the same correlation variable in the case), that I would see output flow files of around the max of 512 mb. But that is not what I see. I played around with changing the max bins and duration but still don't seem to be able to "force" large files. Instead I see files around 100 -150 mb. Can someone point me to a more detailed description of how the binning logic works? Would like to understand the interplay between the number of bins, duration, and size when you have sets of flow files coming in that are linked to different correlation variables. In my case, if I process all my file types, I have about 19 different classes of data so there are 19 different values for the correlation variable I use "StatClass". Why would one want many or few max bins? Does a larger value of duration will put more memory pressure on the JVM or are the bins accumulated as files on disk rather than in memory? I am trying to produce large files for HDFS storage from a stream of many smaller files.
Thanks
