Hi, My Name is Rifat. I am a Software Engineer at ESPN/Disney. I have been using Nifi for almost one year now and We have a 10 Node Nifi Cluster setup in our production environment. As per the best practices document:
https://community.cloudera.com/t5/Community-Articles/NiFi-Sizing-Guide-Deployment-Best-Practices/ta-p/246781 I would want to have 5 separate Repos for Content Repo, 5 for Provenance Repo and 1 for Flowfile Repo per node. I need your expert advice on whether using EBS or EFS is the best approach to achieve this goal. I already tried EFS and I saw some problems with Load Balancing since I mounted an EFS Volume per partition(that's 11 EFS partitions per node). This is from the Response I got from AWS after raising a support ticket with them. Hello Rifat, Thank you for contacting AWS Premium Support. I am not familiar with Apache NiFi, as this is a third-party software not covered by our support policy [1]. That been said, I had a look at its documentation [2] and some related links, and it doesn't seem to me that these repositories are meant to be on shared storage accessible by all nodes. If you look at the NiFi Architecture section, it's suggested that this data is stored locally on each node, and the guidelines in the link you provided us with are aligned with that principle. I did find some connection between NiFi and Hadoop Distributed File System (HDFS), which has some fundamental differences to EFS, however it doesn't seem to have any relation with these repositories. While EFS itself provides strong durability and availability guarantees, the NFS protocol is meant to provide weaker cache coherence among its clients as a trade-off for higher performance. Characteristics such as Attribute Caching, Directory Entry Caching, Asynchronous writes, and the differences in how file timestamps are maintained lead to discrepancies in how each node sees data, potentially impacting clustered applications expecting strong consistency. You'll find a good write-up on that in the Linux NFS documentation [3], section "Data and Metadata Coherence". To see if one of these characteristics are causing the issue, I advise you to append 'sync' and 'noac' as mount options for all EFS resources in all nodes; the first one will cause all write I/O to become synchronous, and the second one will disable Attribute and Directory Entry caching. If that helps resolve the issues you are seeing, we'll know that NiFi is expecting strong cache coherence. However, you'll need to evaluate if the performance penalty of mounting with these options is bearable. It may be possible that EBS or even Instance Store are better options to host these repositories, provided that you understand the differences in performance and durability between the two. On a side note, you are missing a few of the recommended mount options for EFS. Although I don't expect them to cause an immediate impact for the issue described in this support case, it's a good idea to implement them to avoid other issues. Please check here [4] for details. Regarding your question on how to enable communication between directories that are mounted on a different EFS, this whole idea of inter-EFS communication does not apply. EFS is a file system, and there's no exchange of data between separate EFS resources; the only "communication" in that sense would be moving data from one EFS to another, which can be done within an instance having both file systems mounted. I believe that at this stage, testing the solution with the proposed mount options above is a good course of action to isolate the problem. With regard to your comment on logging into these machines and reading the contents of /var/log/nifi, please note that Support personnel is not allowed under any circumstances to access customer's instances. At this stage, I believe that these logs are not required for this case. To summarise, my first advice is that you seek advice from NiFi experts on whether using a distributed file system such as EFS to host cluster node's repositories is a valid approach. If using Cloudera Flow Management, you should be able to receive support from Cloudera, otherwise the NiFi Community is an option [6]. The second advice is to test EFS mounted with 'sync' and 'noac' to see if it helps resolve the issue; if the performance penalty is unbearable, consider switching repositories to EBS or Instance Store volumes. If you have questions on the above, please let me know. Please let me know the Best Approach to take to solve this problem. Best Regards, Rifat