Hi NiFi team,

We had one of the nodes in our cluster go offline today. We eventually resolved 
the issue, but it exposed some issues in our configuration across our edge NiFi 
instances.

Right now we have non-clustered instances of NiFi distributed around the world, 
pushing data back to a three node cluster via Site-to-Site. All of these 
instances use the name of the first node (node01), and pull back the peer list 
and weights from it. But node01 is the node that went offline today, and while 
some site-to-site connections appeared to use cached data and continued 
uploading data to node02 and node03, many of the site-to-site connections went 
down because they were not able to pull the peer list from the cluster, which 
makes perfect sense to me.

One question that I was curious about, how long is a peer list cached for if an 
updated list can't be retrieved  from the cluster?

What are the best practices for fixing this? We were throwing around ideas of 
using a load balancer or round robin DNS name as the entry point for 
site-to-site, but I figured others have probably already tackled this problem 
before and could share some ideas.

Thanks,
  Peter

Reply via email to