Re: Nifi - Content-repo on AWS-EBS volumes

2023-12-11 Thread Mark Payne
Hey Phil,

NiFi will not spread the content of a single file over multiple partitions. It 
will write the content of FlowFile 1 to content repo 1, then write the next 
FlowFile to repo 2, etc. so it does round-robin but does not spread a single 
FlowFile across multiple repos.

Thanks
-Mark

Sent from my iPhone

> On Dec 11, 2023, at 8:45 PM, Phillip Lord  wrote:
> 
> 
> Hello Nifi comrades,
> 
> Here's my scenario...
> Let's say I have a Nifi cluster running on EC2 instances with attached EBS 
> volumes serving as their repos.  They've split up their content-repos into 
> three content-repos per node(cont1, cont2, cont3).  Each being a dedicated 
> EBS volume.  My understanding is that the content-claims for a single file 
> can potentially span across more than one of these repos.(correct me if I've 
> lost my mind over the years)
> For instance if you have a 1 MB file, and lets say your 
> max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) potentially 
> split up across the 3 EBS volumes.  So if Nifi is trying to move that file to 
> S3 or something for instance... it needs to be read from each of the volumes. 
>  
> Whereas if it was a single EBS volume for the cont-repo... it would read from 
> the single volume, which I would think would be more performant?  Or does 
> spreading out any IO contention across volumes provide more of a benefit?
> I know there's different levels of EBS volumes... but not factoring that in 
> for right now.
> 
> Appreciate any insight... trying to determine the best configuration.  
> 
> Thanks,
> Phil
> 
> 


Nifi - Content-repo on AWS-EBS volumes

2023-12-11 Thread Phillip Lord
Hello Nifi comrades,

Here's my scenario...
Let's say I have a Nifi cluster running on EC2 instances with attached EBS
volumes serving as their repos.  They've split up their content-repos into
three content-repos per node(cont1, cont2, cont3).  Each being a dedicated
EBS volume.  My understanding is that the content-claims for a single file
can potentially span across more than one of these repos.(correct me if
I've lost my mind over the years)
For instance if you have a 1 MB file, and lets say your
max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) potentially
split up across the 3 EBS volumes.  So if Nifi is trying to move that file
to S3 or something for instance... it needs to be read from each of the
volumes.
Whereas if it was a single EBS volume for the cont-repo... it would read
from the single volume, which I would think would be more performant?  Or
does spreading out any IO contention across volumes provide more of a
benefit?
I know there's different levels of EBS volumes... but not factoring that in
for right now.

Appreciate any insight... trying to determine the best configuration.

Thanks,
Phil