Hi Juan, Thank you very much, I have already seen those documents. So it is completely clear to me for a Direct Attached Storage scenario, but I am investigating the situation of a fully virtualized platform with a shared storage.
Cheers, Ali On Thu, May 18, 2017 at 12:00 AM, Joe Skora <[email protected]> wrote: > What I meant is that in general, multiple disks have a higher potential > maximum throughput than a single disk. For example, if a single 1TB disk > capable of 160MB/s is split into 4x 250GB volumes the total combined > bandwidth of the volumes is still 160MB/s, but if data is distributed > across four 250GB disks capable of 160MB/s the potential throughput is up > to 640MB/s. The motherboard, operating system, volume of files, file > sizes, and physical distribution of data across the disks will all affect > the actual bandwidth seen. > > On virtualized disks, the disk configuration and physical distribution of > data cannot be controlled so splitting the volumes doesn't give the same > performance benefit. > > On Wed, May 17, 2017 at 9:27 AM, Ali Nazemian <[email protected]> > wrote: > >> Hi Joe, >> >> Can you please explain what will happen that still we will see a >> performance increase through using multiple volumes for each repository? So >> practically using different volumes for FlowFile, Provenance and Content >> would overcome space collision situation. Based on the mentioned example so >> 100GB FlowFile, 1TB prov and 4TB Content Repo should still have less >> throughput than 100GB FlowFile, 2x500GB prov and 8x500GB content repo in >> practice for a fully virtualized environment. >> >> Regards, >> Ali >> >> On Wed, May 17, 2017 at 10:06 PM, Joe Skora <[email protected]> wrote: >> >>> Ali, >>> >>> If you can separate the repositories onto separate physical spindles I >>> would expect a performance benefit, but if they are all on virtualized >>> storage I'd expect less performance benefit from separate volumes. But, >>> even on virtualized storage, separate volumes can help reduce space >>> collision problems, preventing runaway system logs or the provenance >>> repository, for instance, from filling the disk and running the content >>> repository out of space. >>> >>> Regards, >>> Joe S >>> >>> On Wed, May 17, 2017 at 5:00 AM, Ali Nazemian <[email protected]> >>> wrote: >>> >>>> Hi all, >>>> >>>> I was wondering whether there is any performance throughput of having >>>> multiple disk mount points for FlowFile, Provenance and Content or using >>>> single mount point for all of them if we are using a fully virtualized >>>> deployment with a shared storage. Suppose we have got 500TB disks in the >>>> Share Storage. Which one do you suggest: 100 GB for FlowFile 2x500GB for >>>> Provenance and 8x500GB for the Content repository or using a single mount >>>> point of 5.1TB for the entire instance? In another word, it would be better >>>> Nifi keeps track of load among the disk mount points or delegate it >>>> entirely to the shared storage? >>>> >>>> Regards, >>>> Ali >>>> >>> >>> >> >> >> -- >> A.Nazemian >> > > -- A.Nazemian
