The inner workings of MergeContent is certainly a FAQ. This message [1] to the users list from a long time ago may help. I think it's still accurate.
[1] - https://lists.apache.org/thread.html/5ab5d9d0bcd0eef8ace391d00f5f5678427bee4b2fbf1e48d78ea8c8@1445464430@%3Cusers.nifi.apache.org%3E Regards, -- Mike On Fri, Jan 4, 2019 at 6:57 AM <[email protected]> wrote: > Hi Jianan > > I just say that as soon as “Minimum Number of Entries” is reached the flow > can be flushed out, and further if the minimum number isn’t reached I > would expect that the “Max Bin Age” takes place. Have you tried that? > > Cheers Josef > > > > > > *From: *Jianan Zhang <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Friday, 4 January 2019 at 12:46 > *To: *"[email protected]" <[email protected]> > *Subject: *Re: A question about [MergeContent] processor > > > > Hi Josef, > > > > Thanks for reply. In my opinion the “Minimum Number of Entries” is should > not and can not stronger than the “Max Bin Age”. Suppose I have only ONE > flowfile from datasource put into MergeContent processor, and I set > "Minimum Number of Entries" = 2, then this ONE flowfile will never coming > out from nifi, even if it reach the deadline of bin. This is very easy lead > to dead lock. > > > > And I don't know how to use the “Merge Strategy: Defragment” to merge the > flowfile from kafka, I really don't know the speed the producer produce the > messge. > > > > Jianan Zhang > > > > <[email protected]> 于2019年1月4日周五 下午6:43写道: > > Hi Jianan > > > > As you have “Minimum Number of Entries: 1” it is normal that you can see > merges with only one flowfile. In my opinion the “Minimum Number of > Entries” is stronger than the “Max Bin Age” (first is written bold and > second not). Additionally it is called “Max Bin Age” and not “Bin Age”. So > as soon as you reach at least 1 flowfile it could be pushed out. However, > in my opinion the documentation for “Max Bin Age” is to unspecific (when > does it really takes place?), only the developers know exactly the function > behind it. Would be great to get more information here… > > > > Just my 2 cents. Whenever possible try to use “Merge Strategy: Defragment” > instead of the current one, but this is working only if it is predictable > how many flowfiles you would like to merge. With this strategy the max bin > age makes fully sense and works as expected. > > > > Cheers Josef > > > > > > *From: *Jianan Zhang <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Friday, 4 January 2019 at 11:16 > *To: *"[email protected]" <[email protected]> > *Subject: *A question about [MergeContent] processor > > > > Hi all, > > I have a job consist of following steps: first consuming data from kafka, > and then packing data every 5 minutes into one file, finally put the packed > file into hdfs. > > I use the [MergeContent] processor to accomplish the “packing” step. The > properties of MergeContent I configured is list below: > > > > ---------------------- > > Merge Strategy: Bin-Packing Algorithm > > Merge Format: Binary Concatenation > > Attribute Strategy: Keep Only Common Attributes > > Correlation Attribute Name: No value set > > Metadata Strategy: Do Not Merge Uncommon Metadata > > Minimum Number of Entries: 1 > > Maximum Number of Entries: 999999999 > > Minimum Group Size: 255 MB > > Maximum Group Size:No value set > > Max Bin Age: 5 minutes > > Maximum number of Bins: 1 > > ---------------------- > > > > I found the behavior of the MergeContent processor is very uncontrollable. > There are serveral workflows running on the nifi with the same > configuration of MergeContent processor, some workflows can packing the > data every 5 minutes into one file correctly, but some others can’t. It > even happened that some MergeContent processor generate one flowfile per > record. > > > > I am wondering if I misunderstanding the machanism of MergeContent > processor. > > > > An newbie of nifi, please help me. > > > > Thanks! > >
