Mark: Got it. Thank you for the help.
Greg > On Dec 15, 2023, at 4:14 PM, Mark Payne <[email protected]> wrote: > > Greg, > > Whether or not multiple content repos will have any impact depends very much > on where your system’s bottleneck is. If your bottleneck is disk I/O, it will > absolutely help. If your bottleneck is CPU, it won’t. If, for example, you’re > running on bare metal and have 48 cores on your machine and you’re running > with spinning disks, you’ll definitely want to use multiple spinning disks. > But if you’re running in AWS on a VM that has 4 cores and you’re using gp3 > EBS volumes, it’s unlikely that multiple content repos will help. > > Thanks > -Mark > > > >> On Dec 15, 2023, at 3:25 PM, Gregory M. Foreman >> <[email protected]> wrote: >> >> Mark: >> >> I was just discussing multiple content repos on EBS volumes with a >> colleague. I found your post from a long time ago: >> >> https://lists.apache.org/thread/nq3mpry0wppzrodmldrcfnxwzp3n1cjv >> >> “Re #2: I don't know that i've used any SAN to back my repositories other >> than the EBS provided by Amazon EC2. In that environment, I found that >> having one or having multiple repos was essentially equivalent.” >> >> Does that statement still hold true today? Essentially there is no real >> performance benefit to having multiple content repos on multiple EBS volumes? >> >> Thanks, >> Greg >> >> >> >>> On Dec 11, 2023, at 8:50 PM, Mark Payne <[email protected]> wrote: >>> >>> Hey Phil, >>> >>> NiFi will not spread the content of a single file over multiple partitions. >>> It will write the content of FlowFile 1 to content repo 1, then write the >>> next FlowFile to repo 2, etc. so it does round-robin but does not spread a >>> single FlowFile across multiple repos. >>> >>> Thanks >>> -Mark >>> >>> Sent from my iPhone >>> >>>> On Dec 11, 2023, at 8:45 PM, Phillip Lord <[email protected]> wrote: >>>> >>>> >>>> Hello Nifi comrades, >>>> >>>> Here's my scenario... >>>> Let's say I have a Nifi cluster running on EC2 instances with attached EBS >>>> volumes serving as their repos. They've split up their content-repos into >>>> three content-repos per node(cont1, cont2, cont3). Each being a dedicated >>>> EBS volume. My understanding is that the content-claims for a single file >>>> can potentially span across more than one of these repos.(correct me if >>>> I've lost my mind over the years) >>>> For instance if you have a 1 MB file, and lets say your >>>> max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) potentially >>>> split up across the 3 EBS volumes. So if Nifi is trying to move that file >>>> to S3 or something for instance... it needs to be read from each of the >>>> volumes. >>>> Whereas if it was a single EBS volume for the cont-repo... it would read >>>> from the single volume, which I would think would be more performant? Or >>>> does spreading out any IO contention across volumes provide more of a >>>> benefit? >>>> I know there's different levels of EBS volumes... but not factoring that >>>> in for right now. >>>> >>>> Appreciate any insight... trying to determine the best configuration. >>>> >>>> Thanks, >>>> Phil >>>> >>>> >> >
