What I meant is that in general, multiple disks have a higher potential
maximum throughput than a single disk.  For example, if a single 1TB disk
capable of 160MB/s is split into 4x 250GB volumes the total combined
bandwidth of the volumes is still 160MB/s, but if data is distributed
across four 250GB disks capable of 160MB/s the potential throughput is up
to 640MB/s.  The motherboard, operating system, volume of files, file
sizes, and physical distribution of data across the disks will all affect
the actual bandwidth seen.

On virtualized disks, the disk configuration and physical distribution of
data cannot be controlled so splitting the volumes doesn't give the same
performance benefit.

On Wed, May 17, 2017 at 9:27 AM, Ali Nazemian <[email protected]> wrote:

> Hi Joe,
>
> Can you please explain what will happen that still we will see a
> performance increase through using multiple volumes for each repository? So
> practically using different volumes for FlowFile, Provenance and Content
> would overcome space collision situation. Based on the mentioned example so
> 100GB FlowFile, 1TB prov and 4TB Content Repo should still have less
> throughput than 100GB FlowFile, 2x500GB prov and 8x500GB content repo in
> practice for a fully virtualized environment.
>
> Regards,
> Ali
>
> On Wed, May 17, 2017 at 10:06 PM, Joe Skora <[email protected]> wrote:
>
>> Ali,
>>
>> If you can separate the repositories onto separate physical spindles I
>> would expect a performance benefit, but if they are all on virtualized
>> storage I'd expect less performance benefit from separate volumes.  But,
>> even on virtualized storage, separate volumes can help reduce space
>> collision problems, preventing runaway system logs or the provenance
>> repository, for instance, from filling the disk and running the content
>> repository out of space.
>>
>> Regards,
>> Joe S
>>
>> On Wed, May 17, 2017 at 5:00 AM, Ali Nazemian <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> I was wondering whether there is any performance throughput of having
>>> multiple disk mount points for FlowFile, Provenance and Content or using
>>> single mount point for all of them if we are using a fully virtualized
>>> deployment with a shared storage. Suppose we have got 500TB disks in the
>>> Share Storage. Which one do you suggest: 100 GB for FlowFile 2x500GB for
>>> Provenance and 8x500GB for the Content repository or using a single mount
>>> point of 5.1TB for the entire instance? In another word, it would be better
>>> Nifi keeps track of load among the disk mount points or delegate it
>>> entirely to the shared storage?
>>>
>>> Regards,
>>> Ali
>>>
>>
>>
>
>
> --
> A.Nazemian
>

Reply via email to