I think the notes on multiple locations for a repository are based on
independent disks not shared storage.  That's why I don't think it will
help in a shared storage environment.

Yes, I can see a potential performance loss if NiFi is given multiple
locations for a repository if the underlying storage (shared or otherwise)
does not provide a performance gain greater than the overhead of managing
multiple storage locations, but those will vary based on the system and
flow.

On Wed, May 17, 2017 at 10:30 AM, Ali Nazemian <[email protected]>
wrote:

> Hi Joe,
>
> I understand the situation of using DAS and it is a recommended option for
> a production environment, but in the case of having a shared storage like
> SAN or NAS, I am not sure how we can see a slightly more throughput with
> having multiple disk volumes for the content repo.
>
> At the storage layer, data is written and read from multiple disks anyway.
> Nifi moves content to content repos in a round robin way. On the other
> hand, shared storage distributes data through raid mechanism. Can we face a
> situation that actually throughput decreases due to any conflict between
> shared storage distribution mechanism and Nifi round robin approach?
>
> Cheers,
> Ali
>
> On Thu, May 18, 2017 at 12:21 AM, Ali Nazemian <[email protected]>
> wrote:
>
>> Hi Juan,
>>
>> Thank you very much, I have already seen those documents. So it is
>> completely clear to me for a Direct Attached Storage scenario, but I am
>> investigating the situation of a fully virtualized platform with a shared
>> storage.
>>
>> Cheers,
>> Ali
>>
>> On Thu, May 18, 2017 at 12:00 AM, Joe Skora <[email protected]> wrote:
>>
>>> What I meant is that in general, multiple disks have a higher potential
>>> maximum throughput than a single disk.  For example, if a single 1TB disk
>>> capable of 160MB/s is split into 4x 250GB volumes the total combined
>>> bandwidth of the volumes is still 160MB/s, but if data is distributed
>>> across four 250GB disks capable of 160MB/s the potential throughput is up
>>> to 640MB/s.  The motherboard, operating system, volume of files, file
>>> sizes, and physical distribution of data across the disks will all affect
>>> the actual bandwidth seen.
>>>
>>> On virtualized disks, the disk configuration and physical distribution
>>> of data cannot be controlled so splitting the volumes doesn't give the same
>>> performance benefit.
>>>
>>> On Wed, May 17, 2017 at 9:27 AM, Ali Nazemian <[email protected]>
>>> wrote:
>>>
>>>> Hi Joe,
>>>>
>>>> Can you please explain what will happen that still we will see a
>>>> performance increase through using multiple volumes for each repository? So
>>>> practically using different volumes for FlowFile, Provenance and Content
>>>> would overcome space collision situation. Based on the mentioned example so
>>>> 100GB FlowFile, 1TB prov and 4TB Content Repo should still have less
>>>> throughput than 100GB FlowFile, 2x500GB prov and 8x500GB content repo in
>>>> practice for a fully virtualized environment.
>>>>
>>>> Regards,
>>>> Ali
>>>>
>>>> On Wed, May 17, 2017 at 10:06 PM, Joe Skora <[email protected]> wrote:
>>>>
>>>>> Ali,
>>>>>
>>>>> If you can separate the repositories onto separate physical spindles I
>>>>> would expect a performance benefit, but if they are all on virtualized
>>>>> storage I'd expect less performance benefit from separate volumes.  But,
>>>>> even on virtualized storage, separate volumes can help reduce space
>>>>> collision problems, preventing runaway system logs or the provenance
>>>>> repository, for instance, from filling the disk and running the content
>>>>> repository out of space.
>>>>>
>>>>> Regards,
>>>>> Joe S
>>>>>
>>>>> On Wed, May 17, 2017 at 5:00 AM, Ali Nazemian <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I was wondering whether there is any performance throughput of having
>>>>>> multiple disk mount points for FlowFile, Provenance and Content or using
>>>>>> single mount point for all of them if we are using a fully virtualized
>>>>>> deployment with a shared storage. Suppose we have got 500TB disks in the
>>>>>> Share Storage. Which one do you suggest: 100 GB for FlowFile 2x500GB for
>>>>>> Provenance and 8x500GB for the Content repository or using a single mount
>>>>>> point of 5.1TB for the entire instance? In another word, it would be 
>>>>>> better
>>>>>> Nifi keeps track of load among the disk mount points or delegate it
>>>>>> entirely to the shared storage?
>>>>>>
>>>>>> Regards,
>>>>>> Ali
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> A.Nazemian
>>>>
>>>
>>>
>>
>>
>> --
>> A.Nazemian
>>
>
>
>
> --
> A.Nazemian
>

Reply via email to