Thank you Mark. On Tue, May 31, 2016 at 1:02 PM, Mark Payne <marka...@hotmail.com> wrote:
> Igor, > > MergeContent will consider a 'bin' full when any one of those conditions > hit. I.e., if you set: > > Max Group Size = 64 MB > Max Number of Entries = 100 > Max Bin Age = 5 mins > > Then you will get a merged bin whenever a bin hits 64 MB, regardless of > how long its been or how many entires there are. > Similarly, if you have 100 entries, then you'll get a bin even if the data > is only 1 MB total. > Also, if you go 5 minutes without reaching either of those thresholds, the > 5 minute threshold will cause the bin to be created, > regardless of how many FlowFiles there are. > > A common pattern for sending to HDFS is to set the Maximum Bin Age to some > threshold (5 mins or 1 hour or whatever makes > sense for you) and the Min Group Size to 64 MB and Max Group Size to 128 > MB and not set anything for the Maximum Number > of Entries. In this case, you will get bins of 64 - 128 MB most of the > time, but if the data volume is low for a while, you'll still get some > data flowing into HDFS because the of the Max Bin Age. > > Thanks > -Markk > > > On May 31, 2016, at 12:07 PM, Igor Kravzov <igork.ine...@gmail.com> > wrote: > > > > There are 2 configuration properties: Maximum Group Size and Maximum > Number of entries. > > Are these mutually exclusive? I want to create a file to store in HDFS > but limit size at 64MB as HDFS block (or should I go bigger?). > > > > Max Bin Age property > > Since content can be in different length and and not know when max size > will be reached, whar role it will play? > >