Ali, Yes that is correct. Reference section "Apache NIFI in depth" of the NIFI docs. [1] And I still use this article for general Apche NIFI best practices when handling high amounts of data. [2] It is tailored for Apache NIFI pre 1 release but still applies.
[1] https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html#repositories [2] https://community.hortonworks.com/content/kbentry/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html On Wed, May 17, 2017 at 9:27 AM Ali Nazemian <[email protected]> wrote: > Hi Joe, > > Can you please explain what will happen that still we will see a > performance increase through using multiple volumes for each repository? So > practically using different volumes for FlowFile, Provenance and Content > would overcome space collision situation. Based on the mentioned example so > 100GB FlowFile, 1TB prov and 4TB Content Repo should still have less > throughput than 100GB FlowFile, 2x500GB prov and 8x500GB content repo in > practice for a fully virtualized environment. > > Regards, > Ali > > On Wed, May 17, 2017 at 10:06 PM, Joe Skora <[email protected]> wrote: > >> Ali, >> >> If you can separate the repositories onto separate physical spindles I >> would expect a performance benefit, but if they are all on virtualized >> storage I'd expect less performance benefit from separate volumes. But, >> even on virtualized storage, separate volumes can help reduce space >> collision problems, preventing runaway system logs or the provenance >> repository, for instance, from filling the disk and running the content >> repository out of space. >> >> Regards, >> Joe S >> >> On Wed, May 17, 2017 at 5:00 AM, Ali Nazemian <[email protected]> >> wrote: >> >>> Hi all, >>> >>> I was wondering whether there is any performance throughput of having >>> multiple disk mount points for FlowFile, Provenance and Content or using >>> single mount point for all of them if we are using a fully virtualized >>> deployment with a shared storage. Suppose we have got 500TB disks in the >>> Share Storage. Which one do you suggest: 100 GB for FlowFile 2x500GB for >>> Provenance and 8x500GB for the Content repository or using a single mount >>> point of 5.1TB for the entire instance? In another word, it would be better >>> Nifi keeps track of load among the disk mount points or delegate it >>> entirely to the shared storage? >>> >>> Regards, >>> Ali >>> >> >> > > > -- > A.Nazemian >
