Greetings

NiFi 2.4 user here (I plan to upgrade but have just not gotten to It yet)

I believe I may have noted an issue with MergeContent in defragment mode when 
the max number of bins is too small.

I recreated it with a test flow. But before I report it as a bug, I would like 
someone to validate that my assumptions are correct.

I've set up a test flow such that has 40,000 flow files.  Each of the 40,000 
files has a content of 0 bytes, and very few attributes other than 
fragment.count, fragment.identifier, and fragment.index.The 40,000 flow files 
have 10,000 unique identifiers,   fragment.count is a consistent value of 4.  
Fragment.index varies from 0..3

I timed this flow specifically so that I have ALL 40,000 flow files sitting 
right at the input  to a single merge content processor. (Note that in this 
example, the nifi is standalone, so there are no cloud issues.)

My trouble seems related to maximum number of bins.  If the max is LESS THAN 
2500, I get a lot of failures, indicating that not all the fragments are 
present.
If the count is more than 5000, everything merges FINE. (I haven't narrowed it 
down any further than that), and I end up back with the original 10,000 flow 
files (as I should)

Admittedly, the bin size SHOULD be 10,000 for this test case.  But from my 
reading, its not supposed to work that way.  It SHOULD be recycling the bins as 
needed.  Admittedly, this would be SLOW, but it shouldn't ERROR.  It really 
doesn't make sense that 5000 worked.  Feels arbitrary, given that 2500 did NOT.

I noticed this because when I was authoring a new flow, I accidently left the 
maximum number of bins to the default value of 5. It had trouble.

So the ultimate question : is this a bug I should report? Or am I not 
understanding something fundamental?

Geoffrey Greene
ATF / Senior Software Ninjaneer


Reply via email to