Ryan,
It gets a bit more complex than this, because the flowfiles may not always be
accessed/read sequentially in exactly the same order that they live on disk,
there’s concurrent threads/disk accessed to consider, etc. But in the best case
scenarios, yes that is accurate.
Keep in mind, though, that what you are comparing there is the performance of
the disk accesses/reads, and that is, of course, not the entire picture. Lots
more going on under the covers, so if you see a performance improvement of 20x
in reading the content, that won’t mean a 20x improvement in overall throughout.
But it sure won’t hurt! :)
-Mark
Sent from my iPhone
> On Apr 21, 2021, at 8:34 PM, Ryan Hendrickson
> wrote:
>
> https://issues.apache.org/jira/browse/NIFI-7646 - Improve performance of
> MergeContent / others that read content of many small FlowFiles
>
> Hi,
> In reference to the ticket above, released in 1.13, the descriptions
> says "if the FlowFile is small, say 200 bytes, the result is that we
> perform 2+ disk accesses to read those 200 bytes (even though 4K - 8K is a
> typical block size and could be read in the same amount of time as those
> 200 bytes)."
>
> To clarify, if the FlowFiles are never more than 1K, and the block size
> is 4k, does that mean this improvement will read 4 FlowFiles with the
> resources of 1?
>
> This would be a 4:1 improvement. Or in the 200 byte scenario, it would
> be a 20:1 improvement?
>
> Thanks,
> Ryan