Hi Joe, Yeah, that's right. Thank you very much for your help.
Cheers, Ali On May 18, 2017 1:03 AM, "Joe Skora" <[email protected]> wrote: > I think the notes on multiple locations for a repository are based on > independent disks not shared storage. That's why I don't think it will > help in a shared storage environment. > > Yes, I can see a potential performance loss if NiFi is given multiple > locations for a repository if the underlying storage (shared or otherwise) > does not provide a performance gain greater than the overhead of managing > multiple storage locations, but those will vary based on the system and > flow. > > On Wed, May 17, 2017 at 10:30 AM, Ali Nazemian <[email protected]> > wrote: > >> Hi Joe, >> >> I understand the situation of using DAS and it is a recommended option >> for a production environment, but in the case of having a shared storage >> like SAN or NAS, I am not sure how we can see a slightly more throughput >> with having multiple disk volumes for the content repo. >> >> At the storage layer, data is written and read from multiple disks >> anyway. Nifi moves content to content repos in a round robin way. On the >> other hand, shared storage distributes data through raid mechanism. Can we >> face a situation that actually throughput decreases due to any conflict >> between shared storage distribution mechanism and Nifi round robin approach? >> >> Cheers, >> Ali >> >> On Thu, May 18, 2017 at 12:21 AM, Ali Nazemian <[email protected]> >> wrote: >> >>> Hi Juan, >>> >>> Thank you very much, I have already seen those documents. So it is >>> completely clear to me for a Direct Attached Storage scenario, but I am >>> investigating the situation of a fully virtualized platform with a shared >>> storage. >>> >>> Cheers, >>> Ali >>> >>> On Thu, May 18, 2017 at 12:00 AM, Joe Skora <[email protected]> wrote: >>> >>>> What I meant is that in general, multiple disks have a higher potential >>>> maximum throughput than a single disk. For example, if a single 1TB disk >>>> capable of 160MB/s is split into 4x 250GB volumes the total combined >>>> bandwidth of the volumes is still 160MB/s, but if data is distributed >>>> across four 250GB disks capable of 160MB/s the potential throughput is up >>>> to 640MB/s. The motherboard, operating system, volume of files, file >>>> sizes, and physical distribution of data across the disks will all affect >>>> the actual bandwidth seen. >>>> >>>> On virtualized disks, the disk configuration and physical distribution >>>> of data cannot be controlled so splitting the volumes doesn't give the same >>>> performance benefit. >>>> >>>> On Wed, May 17, 2017 at 9:27 AM, Ali Nazemian <[email protected]> >>>> wrote: >>>> >>>>> Hi Joe, >>>>> >>>>> Can you please explain what will happen that still we will see a >>>>> performance increase through using multiple volumes for each repository? >>>>> So >>>>> practically using different volumes for FlowFile, Provenance and Content >>>>> would overcome space collision situation. Based on the mentioned example >>>>> so >>>>> 100GB FlowFile, 1TB prov and 4TB Content Repo should still have less >>>>> throughput than 100GB FlowFile, 2x500GB prov and 8x500GB content repo in >>>>> practice for a fully virtualized environment. >>>>> >>>>> Regards, >>>>> Ali >>>>> >>>>> On Wed, May 17, 2017 at 10:06 PM, Joe Skora <[email protected]> wrote: >>>>> >>>>>> Ali, >>>>>> >>>>>> If you can separate the repositories onto separate physical spindles >>>>>> I would expect a performance benefit, but if they are all on virtualized >>>>>> storage I'd expect less performance benefit from separate volumes. But, >>>>>> even on virtualized storage, separate volumes can help reduce space >>>>>> collision problems, preventing runaway system logs or the provenance >>>>>> repository, for instance, from filling the disk and running the content >>>>>> repository out of space. >>>>>> >>>>>> Regards, >>>>>> Joe S >>>>>> >>>>>> On Wed, May 17, 2017 at 5:00 AM, Ali Nazemian <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I was wondering whether there is any performance throughput of >>>>>>> having multiple disk mount points for FlowFile, Provenance and Content >>>>>>> or >>>>>>> using single mount point for all of them if we are using a fully >>>>>>> virtualized deployment with a shared storage. Suppose we have got 500TB >>>>>>> disks in the Share Storage. Which one do you suggest: 100 GB for >>>>>>> FlowFile >>>>>>> 2x500GB for Provenance and 8x500GB for the Content repository or using a >>>>>>> single mount point of 5.1TB for the entire instance? In another word, it >>>>>>> would be better Nifi keeps track of load among the disk mount points or >>>>>>> delegate it entirely to the shared storage? >>>>>>> >>>>>>> Regards, >>>>>>> Ali >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> A.Nazemian >>>>> >>>> >>>> >>> >>> >>> -- >>> A.Nazemian >>> >> >> >> >> -- >> A.Nazemian >> > >
