Mark:

Got it.  Thank you for the help.

Greg

> On Dec 15, 2023, at 4:14 PM, Mark Payne <[email protected]> wrote:
> 
> Greg,
> 
> Whether or not multiple content repos will have any impact depends very much 
> on where your system’s bottleneck is. If your bottleneck is disk I/O, it will 
> absolutely help. If your bottleneck is CPU, it won’t. If, for example, you’re 
> running on bare metal and have 48 cores on your machine and you’re running 
> with spinning disks, you’ll definitely want to use multiple spinning disks. 
> But if you’re running in AWS on a VM that has 4 cores and you’re using gp3 
> EBS volumes, it’s unlikely that multiple content repos will help.
> 
> Thanks
> -Mark
> 
> 
> 
>> On Dec 15, 2023, at 3:25 PM, Gregory M. Foreman 
>> <[email protected]> wrote:
>> 
>> Mark:
>> 
>> I was just discussing multiple content repos on EBS volumes with a 
>> colleague.  I found your post from a long time ago:
>> 
>> https://lists.apache.org/thread/nq3mpry0wppzrodmldrcfnxwzp3n1cjv
>> 
>> “Re #2: I don't know that i've used any SAN to back my repositories other 
>> than the EBS provided by Amazon EC2. In that environment, I found that 
>> having one or having multiple repos was essentially equivalent.”
>> 
>> Does that statement still hold true today?  Essentially there is no real 
>> performance benefit to having multiple content repos on multiple EBS volumes?
>> 
>> Thanks,
>> Greg
>> 
>> 
>> 
>>> On Dec 11, 2023, at 8:50 PM, Mark Payne <[email protected]> wrote:
>>> 
>>> Hey Phil,
>>> 
>>> NiFi will not spread the content of a single file over multiple partitions. 
>>> It will write the content of FlowFile 1 to content repo 1, then write the 
>>> next FlowFile to repo 2, etc. so it does round-robin but does not spread a 
>>> single FlowFile across multiple repos.
>>> 
>>> Thanks
>>> -Mark
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Dec 11, 2023, at 8:45 PM, Phillip Lord <[email protected]> wrote:
>>>> 
>>>> 
>>>> Hello Nifi comrades,
>>>> 
>>>> Here's my scenario...
>>>> Let's say I have a Nifi cluster running on EC2 instances with attached EBS 
>>>> volumes serving as their repos.  They've split up their content-repos into 
>>>> three content-repos per node(cont1, cont2, cont3).  Each being a dedicated 
>>>> EBS volume.  My understanding is that the content-claims for a single file 
>>>> can potentially span across more than one of these repos.(correct me if 
>>>> I've lost my mind over the years)
>>>> For instance if you have a 1 MB file, and lets say your 
>>>> max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) potentially 
>>>> split up across the 3 EBS volumes.  So if Nifi is trying to move that file 
>>>> to S3 or something for instance... it needs to be read from each of the 
>>>> volumes.  
>>>> Whereas if it was a single EBS volume for the cont-repo... it would read 
>>>> from the single volume, which I would think would be more performant?  Or 
>>>> does spreading out any IO contention across volumes provide more of a 
>>>> benefit?
>>>> I know there's different levels of EBS volumes... but not factoring that 
>>>> in for right now.
>>>> 
>>>> Appreciate any insight... trying to determine the best configuration.  
>>>> 
>>>> Thanks,
>>>> Phil
>>>> 
>>>> 
>> 
> 

Reply via email to