Hi, were setting up a flow file that ingest small records (consisting of one short csv line) from kafka, then treat / filter / rout / merge / compress those to write it in hdfs.
We do not understand well how the merge processor should be setup has it does not work as we expect. We want it to merge records in flowfiles that will at the end fill our hdfs blocks (128 MB for now) . Here are the merge processor parameters: Min Number of record : 1000000 Min bin size : 200MB Max Bin size 250 MB Max number of Bins:1 In the setting we let the Concurrent Task paramater to 1 As you can see we set up a higher Max Bin Size, than what we want because the flowfiles are compressed on a further processor. But we observed that while we specify a max number of Bins of 1 and a minimum Bin Size of 200MB, the resulting comportments does not barely respect those parameters: creating two small flowfiles (around 25 MB) at a time while the queue contains enough element to fill one with 128 MB. So our question is if there is a way to parameter our processors to achieve our goal: filling hdfs with files of size around 120 MB. Thanks in advance, Gautier.
