Re: Nifi - Content-repo on AWS-EBS volumes

2023-12-15 Thread Phillip Lord
I just switched a cluster using 3 EBS volumes for cont-repo from gp2 to gp3… 
resolved definite I/O throughput issues.  The change to gp3 was significant 
enough that I might actually reduce from 3 to 2 volumes, perhaps even a single 
volume would be sufficient.

Of course every use case is unique.
On Dec 15, 2023 at 5:37 PM -0500, Gregory M. Foreman 
, wrote:
> Mark:
>
> Got it. Thank you for the help.
>
> Greg
>
> > On Dec 15, 2023, at 4:14 PM, Mark Payne  wrote:
> >
> > Greg,
> >
> > Whether or not multiple content repos will have any impact depends very 
> > much on where your system’s bottleneck is. If your bottleneck is disk I/O, 
> > it will absolutely help. If your bottleneck is CPU, it won’t. If, for 
> > example, you’re running on bare metal and have 48 cores on your machine and 
> > you’re running with spinning disks, you’ll definitely want to use multiple 
> > spinning disks. But if you’re running in AWS on a VM that has 4 cores and 
> > you’re using gp3 EBS volumes, it’s unlikely that multiple content repos 
> > will help.
> >
> > Thanks
> > -Mark
> >
> >
> >
> > > On Dec 15, 2023, at 3:25 PM, Gregory M. Foreman 
> > >  wrote:
> > >
> > > Mark:
> > >
> > > I was just discussing multiple content repos on EBS volumes with a 
> > > colleague. I found your post from a long time ago:
> > >
> > > https://lists.apache.org/thread/nq3mpry0wppzrodmldrcfnxwzp3n1cjv
> > >
> > > “Re #2: I don't know that i've used any SAN to back my repositories other 
> > > than the EBS provided by Amazon EC2. In that environment, I found that 
> > > having one or having multiple repos was essentially equivalent.”
> > >
> > > Does that statement still hold true today? Essentially there is no real 
> > > performance benefit to having multiple content repos on multiple EBS 
> > > volumes?
> > >
> > > Thanks,
> > > Greg
> > >
> > >
> > >
> > > > On Dec 11, 2023, at 8:50 PM, Mark Payne  wrote:
> > > >
> > > > Hey Phil,
> > > >
> > > > NiFi will not spread the content of a single file over multiple 
> > > > partitions. It will write the content of FlowFile 1 to content repo 1, 
> > > > then write the next FlowFile to repo 2, etc. so it does round-robin but 
> > > > does not spread a single FlowFile across multiple repos.
> > > >
> > > > Thanks
> > > > -Mark
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On Dec 11, 2023, at 8:45 PM, Phillip Lord  
> > > > > wrote:
> > > > >
> > > > >
> > > > > Hello Nifi comrades,
> > > > >
> > > > > Here's my scenario...
> > > > > Let's say I have a Nifi cluster running on EC2 instances with 
> > > > > attached EBS volumes serving as their repos. They've split up their 
> > > > > content-repos into three content-repos per node(cont1, cont2, cont3). 
> > > > > Each being a dedicated EBS volume. My understanding is that the 
> > > > > content-claims for a single file can potentially span across more 
> > > > > than one of these repos.(correct me if I've lost my mind over the 
> > > > > years)
> > > > > For instance if you have a 1 MB file, and lets say your 
> > > > > max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) 
> > > > > potentially split up across the 3 EBS volumes. So if Nifi is trying 
> > > > > to move that file to S3 or something for instance... it needs to be 
> > > > > read from each of the volumes.
> > > > > Whereas if it was a single EBS volume for the cont-repo... it would 
> > > > > read from the single volume, which I would think would be more 
> > > > > performant? Or does spreading out any IO contention across volumes 
> > > > > provide more of a benefit?
> > > > > I know there's different levels of EBS volumes... but not factoring 
> > > > > that in for right now.
> > > > >
> > > > > Appreciate any insight... trying to determine the best configuration.
> > > > >
> > > > > Thanks,
> > > > > Phil
> > > > >
> > > > >
> > >
> >
>


Re: Nifi - Content-repo on AWS-EBS volumes

2023-12-15 Thread Gregory M. Foreman
Mark:

Got it.  Thank you for the help.

Greg

> On Dec 15, 2023, at 4:14 PM, Mark Payne  wrote:
> 
> Greg,
> 
> Whether or not multiple content repos will have any impact depends very much 
> on where your system’s bottleneck is. If your bottleneck is disk I/O, it will 
> absolutely help. If your bottleneck is CPU, it won’t. If, for example, you’re 
> running on bare metal and have 48 cores on your machine and you’re running 
> with spinning disks, you’ll definitely want to use multiple spinning disks. 
> But if you’re running in AWS on a VM that has 4 cores and you’re using gp3 
> EBS volumes, it’s unlikely that multiple content repos will help.
> 
> Thanks
> -Mark
> 
> 
> 
>> On Dec 15, 2023, at 3:25 PM, Gregory M. Foreman 
>>  wrote:
>> 
>> Mark:
>> 
>> I was just discussing multiple content repos on EBS volumes with a 
>> colleague.  I found your post from a long time ago:
>> 
>> https://lists.apache.org/thread/nq3mpry0wppzrodmldrcfnxwzp3n1cjv
>> 
>> “Re #2: I don't know that i've used any SAN to back my repositories other 
>> than the EBS provided by Amazon EC2. In that environment, I found that 
>> having one or having multiple repos was essentially equivalent.”
>> 
>> Does that statement still hold true today?  Essentially there is no real 
>> performance benefit to having multiple content repos on multiple EBS volumes?
>> 
>> Thanks,
>> Greg
>> 
>> 
>> 
>>> On Dec 11, 2023, at 8:50 PM, Mark Payne  wrote:
>>> 
>>> Hey Phil,
>>> 
>>> NiFi will not spread the content of a single file over multiple partitions. 
>>> It will write the content of FlowFile 1 to content repo 1, then write the 
>>> next FlowFile to repo 2, etc. so it does round-robin but does not spread a 
>>> single FlowFile across multiple repos.
>>> 
>>> Thanks
>>> -Mark
>>> 
>>> Sent from my iPhone
>>> 
 On Dec 11, 2023, at 8:45 PM, Phillip Lord  wrote:
 
 
 Hello Nifi comrades,
 
 Here's my scenario...
 Let's say I have a Nifi cluster running on EC2 instances with attached EBS 
 volumes serving as their repos.  They've split up their content-repos into 
 three content-repos per node(cont1, cont2, cont3).  Each being a dedicated 
 EBS volume.  My understanding is that the content-claims for a single file 
 can potentially span across more than one of these repos.(correct me if 
 I've lost my mind over the years)
 For instance if you have a 1 MB file, and lets say your 
 max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) potentially 
 split up across the 3 EBS volumes.  So if Nifi is trying to move that file 
 to S3 or something for instance... it needs to be read from each of the 
 volumes.  
 Whereas if it was a single EBS volume for the cont-repo... it would read 
 from the single volume, which I would think would be more performant?  Or 
 does spreading out any IO contention across volumes provide more of a 
 benefit?
 I know there's different levels of EBS volumes... but not factoring that 
 in for right now.
 
 Appreciate any insight... trying to determine the best configuration.  
 
 Thanks,
 Phil
 
 
>> 
> 



Re: Nifi - Content-repo on AWS-EBS volumes

2023-12-15 Thread Mark Payne
Greg,

Whether or not multiple content repos will have any impact depends very much on 
where your system’s bottleneck is. If your bottleneck is disk I/O, it will 
absolutely help. If your bottleneck is CPU, it won’t. If, for example, you’re 
running on bare metal and have 48 cores on your machine and you’re running with 
spinning disks, you’ll definitely want to use multiple spinning disks. But if 
you’re running in AWS on a VM that has 4 cores and you’re using gp3 EBS 
volumes, it’s unlikely that multiple content repos will help.

Thanks
-Mark



> On Dec 15, 2023, at 3:25 PM, Gregory M. Foreman 
>  wrote:
> 
> Mark:
> 
> I was just discussing multiple content repos on EBS volumes with a colleague. 
>  I found your post from a long time ago:
> 
> https://lists.apache.org/thread/nq3mpry0wppzrodmldrcfnxwzp3n1cjv
> 
> “Re #2: I don't know that i've used any SAN to back my repositories other 
> than the EBS provided by Amazon EC2. In that environment, I found that having 
> one or having multiple repos was essentially equivalent.”
> 
> Does that statement still hold true today?  Essentially there is no real 
> performance benefit to having multiple content repos on multiple EBS volumes?
> 
> Thanks,
> Greg
> 
> 
> 
>> On Dec 11, 2023, at 8:50 PM, Mark Payne  wrote:
>> 
>> Hey Phil,
>> 
>> NiFi will not spread the content of a single file over multiple partitions. 
>> It will write the content of FlowFile 1 to content repo 1, then write the 
>> next FlowFile to repo 2, etc. so it does round-robin but does not spread a 
>> single FlowFile across multiple repos.
>> 
>> Thanks
>> -Mark
>> 
>> Sent from my iPhone
>> 
>>> On Dec 11, 2023, at 8:45 PM, Phillip Lord  wrote:
>>> 
>>> 
>>> Hello Nifi comrades,
>>> 
>>> Here's my scenario...
>>> Let's say I have a Nifi cluster running on EC2 instances with attached EBS 
>>> volumes serving as their repos.  They've split up their content-repos into 
>>> three content-repos per node(cont1, cont2, cont3).  Each being a dedicated 
>>> EBS volume.  My understanding is that the content-claims for a single file 
>>> can potentially span across more than one of these repos.(correct me if 
>>> I've lost my mind over the years)
>>> For instance if you have a 1 MB file, and lets say your 
>>> max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) potentially 
>>> split up across the 3 EBS volumes.  So if Nifi is trying to move that file 
>>> to S3 or something for instance... it needs to be read from each of the 
>>> volumes.  
>>> Whereas if it was a single EBS volume for the cont-repo... it would read 
>>> from the single volume, which I would think would be more performant?  Or 
>>> does spreading out any IO contention across volumes provide more of a 
>>> benefit?
>>> I know there's different levels of EBS volumes... but not factoring that in 
>>> for right now.
>>> 
>>> Appreciate any insight... trying to determine the best configuration.  
>>> 
>>> Thanks,
>>> Phil
>>> 
>>> 
> 



Re: Nifi - Content-repo on AWS-EBS volumes

2023-12-15 Thread Gregory M. Foreman
Mark:

I was just discussing multiple content repos on EBS volumes with a colleague.  
I found your post from a long time ago:

https://lists.apache.org/thread/nq3mpry0wppzrodmldrcfnxwzp3n1cjv

“Re #2: I don't know that i've used any SAN to back my repositories other than 
the EBS provided by Amazon EC2. In that environment, I found that having one or 
having multiple repos was essentially equivalent.”

Does that statement still hold true today?  Essentially there is no real 
performance benefit to having multiple content repos on multiple EBS volumes?

Thanks,
Greg



> On Dec 11, 2023, at 8:50 PM, Mark Payne  wrote:
> 
> Hey Phil,
> 
> NiFi will not spread the content of a single file over multiple partitions. 
> It will write the content of FlowFile 1 to content repo 1, then write the 
> next FlowFile to repo 2, etc. so it does round-robin but does not spread a 
> single FlowFile across multiple repos.
> 
> Thanks
> -Mark
> 
> Sent from my iPhone
> 
>> On Dec 11, 2023, at 8:45 PM, Phillip Lord  wrote:
>> 
>> 
>> Hello Nifi comrades,
>> 
>> Here's my scenario...
>> Let's say I have a Nifi cluster running on EC2 instances with attached EBS 
>> volumes serving as their repos.  They've split up their content-repos into 
>> three content-repos per node(cont1, cont2, cont3).  Each being a dedicated 
>> EBS volume.  My understanding is that the content-claims for a single file 
>> can potentially span across more than one of these repos.(correct me if I've 
>> lost my mind over the years)
>> For instance if you have a 1 MB file, and lets say your 
>> max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) potentially 
>> split up across the 3 EBS volumes.  So if Nifi is trying to move that file 
>> to S3 or something for instance... it needs to be read from each of the 
>> volumes.  
>> Whereas if it was a single EBS volume for the cont-repo... it would read 
>> from the single volume, which I would think would be more performant?  Or 
>> does spreading out any IO contention across volumes provide more of a 
>> benefit?
>> I know there's different levels of EBS volumes... but not factoring that in 
>> for right now.
>> 
>> Appreciate any insight... trying to determine the best configuration.  
>> 
>> Thanks,
>> Phil
>> 
>> 



Re: Nifi - Content-repo on AWS-EBS volumes

2023-12-11 Thread Mark Payne
Hey Phil,

NiFi will not spread the content of a single file over multiple partitions. It 
will write the content of FlowFile 1 to content repo 1, then write the next 
FlowFile to repo 2, etc. so it does round-robin but does not spread a single 
FlowFile across multiple repos.

Thanks
-Mark

Sent from my iPhone

> On Dec 11, 2023, at 8:45 PM, Phillip Lord  wrote:
> 
> 
> Hello Nifi comrades,
> 
> Here's my scenario...
> Let's say I have a Nifi cluster running on EC2 instances with attached EBS 
> volumes serving as their repos.  They've split up their content-repos into 
> three content-repos per node(cont1, cont2, cont3).  Each being a dedicated 
> EBS volume.  My understanding is that the content-claims for a single file 
> can potentially span across more than one of these repos.(correct me if I've 
> lost my mind over the years)
> For instance if you have a 1 MB file, and lets say your 
> max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) potentially 
> split up across the 3 EBS volumes.  So if Nifi is trying to move that file to 
> S3 or something for instance... it needs to be read from each of the volumes. 
>  
> Whereas if it was a single EBS volume for the cont-repo... it would read from 
> the single volume, which I would think would be more performant?  Or does 
> spreading out any IO contention across volumes provide more of a benefit?
> I know there's different levels of EBS volumes... but not factoring that in 
> for right now.
> 
> Appreciate any insight... trying to determine the best configuration.  
> 
> Thanks,
> Phil
> 
> 


Nifi - Content-repo on AWS-EBS volumes

2023-12-11 Thread Phillip Lord
Hello Nifi comrades,

Here's my scenario...
Let's say I have a Nifi cluster running on EC2 instances with attached EBS
volumes serving as their repos.  They've split up their content-repos into
three content-repos per node(cont1, cont2, cont3).  Each being a dedicated
EBS volume.  My understanding is that the content-claims for a single file
can potentially span across more than one of these repos.(correct me if
I've lost my mind over the years)
For instance if you have a 1 MB file, and lets say your
max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) potentially
split up across the 3 EBS volumes.  So if Nifi is trying to move that file
to S3 or something for instance... it needs to be read from each of the
volumes.
Whereas if it was a single EBS volume for the cont-repo... it would read
from the single volume, which I would think would be more performant?  Or
does spreading out any IO contention across volumes provide more of a
benefit?
I know there's different levels of EBS volumes... but not factoring that in
for right now.

Appreciate any insight... trying to determine the best configuration.

Thanks,
Phil