Hi Joe,

I understand the situation of using DAS and it is a recommended option for
a production environment, but in the case of having a shared storage like
SAN or NAS, I am not sure how we can see a slightly more throughput with
having multiple disk volumes for the content repo.

At the storage layer, data is written and read from multiple disks anyway.
Nifi moves content to content repos in a round robin way. On the other
hand, shared storage distributes data through raid mechanism. Can we face a
situation that actually throughput decreases due to any conflict between
shared storage distribution mechanism and Nifi round robin approach?

Cheers,
Ali

On Thu, May 18, 2017 at 12:21 AM, Ali Nazemian <[email protected]>
wrote:

> Hi Juan,
>
> Thank you very much, I have already seen those documents. So it is
> completely clear to me for a Direct Attached Storage scenario, but I am
> investigating the situation of a fully virtualized platform with a shared
> storage.
>
> Cheers,
> Ali
>
> On Thu, May 18, 2017 at 12:00 AM, Joe Skora <[email protected]> wrote:
>
>> What I meant is that in general, multiple disks have a higher potential
>> maximum throughput than a single disk.  For example, if a single 1TB disk
>> capable of 160MB/s is split into 4x 250GB volumes the total combined
>> bandwidth of the volumes is still 160MB/s, but if data is distributed
>> across four 250GB disks capable of 160MB/s the potential throughput is up
>> to 640MB/s.  The motherboard, operating system, volume of files, file
>> sizes, and physical distribution of data across the disks will all affect
>> the actual bandwidth seen.
>>
>> On virtualized disks, the disk configuration and physical distribution of
>> data cannot be controlled so splitting the volumes doesn't give the same
>> performance benefit.
>>
>> On Wed, May 17, 2017 at 9:27 AM, Ali Nazemian <[email protected]>
>> wrote:
>>
>>> Hi Joe,
>>>
>>> Can you please explain what will happen that still we will see a
>>> performance increase through using multiple volumes for each repository? So
>>> practically using different volumes for FlowFile, Provenance and Content
>>> would overcome space collision situation. Based on the mentioned example so
>>> 100GB FlowFile, 1TB prov and 4TB Content Repo should still have less
>>> throughput than 100GB FlowFile, 2x500GB prov and 8x500GB content repo in
>>> practice for a fully virtualized environment.
>>>
>>> Regards,
>>> Ali
>>>
>>> On Wed, May 17, 2017 at 10:06 PM, Joe Skora <[email protected]> wrote:
>>>
>>>> Ali,
>>>>
>>>> If you can separate the repositories onto separate physical spindles I
>>>> would expect a performance benefit, but if they are all on virtualized
>>>> storage I'd expect less performance benefit from separate volumes.  But,
>>>> even on virtualized storage, separate volumes can help reduce space
>>>> collision problems, preventing runaway system logs or the provenance
>>>> repository, for instance, from filling the disk and running the content
>>>> repository out of space.
>>>>
>>>> Regards,
>>>> Joe S
>>>>
>>>> On Wed, May 17, 2017 at 5:00 AM, Ali Nazemian <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I was wondering whether there is any performance throughput of having
>>>>> multiple disk mount points for FlowFile, Provenance and Content or using
>>>>> single mount point for all of them if we are using a fully virtualized
>>>>> deployment with a shared storage. Suppose we have got 500TB disks in the
>>>>> Share Storage. Which one do you suggest: 100 GB for FlowFile 2x500GB for
>>>>> Provenance and 8x500GB for the Content repository or using a single mount
>>>>> point of 5.1TB for the entire instance? In another word, it would be 
>>>>> better
>>>>> Nifi keeps track of load among the disk mount points or delegate it
>>>>> entirely to the shared storage?
>>>>>
>>>>> Regards,
>>>>> Ali
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> A.Nazemian
>>>
>>
>>
>
>
> --
> A.Nazemian
>



-- 
A.Nazemian

Reply via email to