I have a flow that sends a large number of JSON files into a MergeContent
processor. The job of that processor is to combine all the incoming flow files
with a particular flow file attribute into a single flow file, creating a JSON
array containing each of the input flow files’ JSON.
I have configured the MergeContent as processor as follows:
Merge Strategy: Bin-Packing Algorithm
Merge Format: Binary Concatenation
Correlation Attribute Name: ${myFlowfileAttributeName}
Minimum number of entries: 1
Maximum number of entries: 5000
Minimum group size: 0 B
Maximum group size: <no value set>
Max bin age: 30 min
Maximum number of bins: 10
Delimiter strategy: Text
Header: [
Footer: ]
Demarcator: ,
When I run data through this flow, I am seeing a large number of small-ish
merged flow files being sent to the merged relationship, I was expecting ALL of
the files for a given flow file attribute value to be binned together, but they
are not coming through that way. To give a example, I pushed through data
containing 262 input JSON files. Of these 262, 2 of them have a flow file
attribute value of ‘A’, 2 of them have a flow file attribute value of ‘B’, and
258 have a flow file attribute of ‘C’. I was expecting the merged relationship
to deliver 3 flow files, one each for value A, B, and C. But.I am seeing 24
flow files on the merged relationship, 1 for a value of A, 1 for a value of B,
and 22 of varying sizes with the value of C.
Can someone help me understand what other criteria MergeContent might be using
to determine when to send along its merged flow files?
Thanks