Hi NiFi team, We had one of the nodes in our cluster go offline today. We eventually resolved the issue, but it exposed some issues in our configuration across our edge NiFi instances.
Right now we have non-clustered instances of NiFi distributed around the world, pushing data back to a three node cluster via Site-to-Site. All of these instances use the name of the first node (node01), and pull back the peer list and weights from it. But node01 is the node that went offline today, and while some site-to-site connections appeared to use cached data and continued uploading data to node02 and node03, many of the site-to-site connections went down because they were not able to pull the peer list from the cluster, which makes perfect sense to me. One question that I was curious about, how long is a peer list cached for if an updated list can't be retrieved from the cluster? What are the best practices for fixing this? We were throwing around ideas of using a load balancer or round robin DNS name as the entry point for site-to-site, but I figured others have probably already tackled this problem before and could share some ideas. Thanks, Peter
