Hi Juan,

Thank you very much, I have already seen those documents. So it is
completely clear to me for a Direct Attached Storage scenario, but I am
investigating the situation of a fully virtualized platform with a shared
storage.

Cheers,
Ali

On Thu, May 18, 2017 at 12:00 AM, Joe Skora <[email protected]> wrote:

> What I meant is that in general, multiple disks have a higher potential
> maximum throughput than a single disk.  For example, if a single 1TB disk
> capable of 160MB/s is split into 4x 250GB volumes the total combined
> bandwidth of the volumes is still 160MB/s, but if data is distributed
> across four 250GB disks capable of 160MB/s the potential throughput is up
> to 640MB/s.  The motherboard, operating system, volume of files, file
> sizes, and physical distribution of data across the disks will all affect
> the actual bandwidth seen.
>
> On virtualized disks, the disk configuration and physical distribution of
> data cannot be controlled so splitting the volumes doesn't give the same
> performance benefit.
>
> On Wed, May 17, 2017 at 9:27 AM, Ali Nazemian <[email protected]>
> wrote:
>
>> Hi Joe,
>>
>> Can you please explain what will happen that still we will see a
>> performance increase through using multiple volumes for each repository? So
>> practically using different volumes for FlowFile, Provenance and Content
>> would overcome space collision situation. Based on the mentioned example so
>> 100GB FlowFile, 1TB prov and 4TB Content Repo should still have less
>> throughput than 100GB FlowFile, 2x500GB prov and 8x500GB content repo in
>> practice for a fully virtualized environment.
>>
>> Regards,
>> Ali
>>
>> On Wed, May 17, 2017 at 10:06 PM, Joe Skora <[email protected]> wrote:
>>
>>> Ali,
>>>
>>> If you can separate the repositories onto separate physical spindles I
>>> would expect a performance benefit, but if they are all on virtualized
>>> storage I'd expect less performance benefit from separate volumes.  But,
>>> even on virtualized storage, separate volumes can help reduce space
>>> collision problems, preventing runaway system logs or the provenance
>>> repository, for instance, from filling the disk and running the content
>>> repository out of space.
>>>
>>> Regards,
>>> Joe S
>>>
>>> On Wed, May 17, 2017 at 5:00 AM, Ali Nazemian <[email protected]>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I was wondering whether there is any performance throughput of having
>>>> multiple disk mount points for FlowFile, Provenance and Content or using
>>>> single mount point for all of them if we are using a fully virtualized
>>>> deployment with a shared storage. Suppose we have got 500TB disks in the
>>>> Share Storage. Which one do you suggest: 100 GB for FlowFile 2x500GB for
>>>> Provenance and 8x500GB for the Content repository or using a single mount
>>>> point of 5.1TB for the entire instance? In another word, it would be better
>>>> Nifi keeps track of load among the disk mount points or delegate it
>>>> entirely to the shared storage?
>>>>
>>>> Regards,
>>>> Ali
>>>>
>>>
>>>
>>
>>
>> --
>> A.Nazemian
>>
>
>


-- 
A.Nazemian

Reply via email to