Re: NIFI-7646 - Improve performance of MergeContent

2021-04-21 Thread Mark Payne
Ryan,

It gets a bit more complex than this, because the flowfiles may not always be 
accessed/read sequentially in exactly the same order that they live on disk, 
there’s concurrent threads/disk accessed to consider, etc. But in the best case 
scenarios, yes that is accurate.

Keep in mind, though, that what you are comparing there is the performance of 
the disk accesses/reads, and that is, of course, not the entire picture. Lots 
more going on under the covers, so if you see a performance improvement of 20x 
in reading the content, that won’t mean a 20x improvement in overall throughout.

But it sure won’t hurt! :)

-Mark

Sent from my iPhone

> On Apr 21, 2021, at 8:34 PM, Ryan Hendrickson 
>  wrote:
> 
> https://issues.apache.org/jira/browse/NIFI-7646 - Improve performance of
> MergeContent / others that read content of many small FlowFiles
> 
> Hi,
>   In reference to the ticket above, released in 1.13, the descriptions
> says "if the FlowFile is small, say 200 bytes, the result is that we
> perform 2+ disk accesses to read those 200 bytes (even though 4K - 8K is a
> typical block size and could be read in the same amount of time as those
> 200 bytes)."
> 
>   To clarify, if the FlowFiles are never more than 1K, and the block size
> is 4k, does that mean this improvement will read 4 FlowFiles with the
> resources of 1?
> 
>   This would be a 4:1 improvement.  Or in the 200 byte scenario, it would
> be a 20:1 improvement?
> 
> Thanks,
> Ryan


NIFI-7646 - Improve performance of MergeContent

2021-04-21 Thread Ryan Hendrickson
https://issues.apache.org/jira/browse/NIFI-7646 - Improve performance of
MergeContent / others that read content of many small FlowFiles

Hi,
   In reference to the ticket above, released in 1.13, the descriptions
says "if the FlowFile is small, say 200 bytes, the result is that we
perform 2+ disk accesses to read those 200 bytes (even though 4K - 8K is a
typical block size and could be read in the same amount of time as those
200 bytes)."

   To clarify, if the FlowFiles are never more than 1K, and the block size
is 4k, does that mean this improvement will read 4 FlowFiles with the
resources of 1?

   This would be a 4:1 improvement.  Or in the 200 byte scenario, it would
be a 20:1 improvement?

Thanks,
Ryan