Hi all, I have a job consist of following steps: first consuming data from kafka, and then packing data every 5 minutes into one file, finally put the packed file into hdfs. I use the [MergeContent] processor to accomplish the “packing” step. The properties of MergeContent I configured is list below:
---------------------- Merge Strategy: Bin-Packing Algorithm Merge Format: Binary Concatenation Attribute Strategy: Keep Only Common Attributes Correlation Attribute Name: No value set Metadata Strategy: Do Not Merge Uncommon Metadata Minimum Number of Entries: 1 Maximum Number of Entries: 999999999 Minimum Group Size: 255 MB Maximum Group Size:No value set Max Bin Age: 5 minutes Maximum number of Bins: 1 ---------------------- I found the behavior of the MergeContent processor is very uncontrollable. There are serveral workflows running on the nifi with the same configuration of MergeContent processor, some workflows can packing the data every 5 minutes into one file correctly, but some others can’t. It even happened that some MergeContent processor generate one flowfile per record. I am wondering if I misunderstanding the machanism of MergeContent processor. An newbie of nifi, please help me. Thanks!
