Thanks Koji, exactly the answers I was looking for.

I've switched over all the Remote Processor Group's to use the list of nodes.

--Peter

-----Original Message-----
From: Koji Kawamura [mailto:ijokaruma...@gmail.com] 
Sent: Thursday, September 27, 2018 7:08 PM
To: users@nifi.apache.org
Subject: [EXT] Re: Cluster Peer Lists

Hi Peter,

Site-to-Site client refreshes remote peer list per 60 secs.
https://github.com/apache/nifi/blob/master/nifi-commons/nifi-site-to-site-client/src/main/java/org/apache/nifi/remote/client/PeerSelector.java#L60

The address configured to setup a S2S client is used to get remote peer list 
initially.
After that, the client knows node01, 02 and 03 are available peers, then when 
it refreshes peer list, even if it fails to access node01, it should retrieve 
the updated peer list from node02 or 03. However, if node01 stays in the remote 
cluster (until it is removed from the cluster, node02 and 03 still think it's a 
part of the cluster), the returned peer list contains node01.
https://github.com/apache/nifi/blob/master/nifi-commons/nifi-site-to-site-client/src/main/java/org/apache/nifi/remote/client/PeerSelector.java#L383

Another thing to note is that S2S client calculates destination for the next 
128 transaction in advance.
So, if your client does not make transactions often, it may take longer for 
re-calculating the next destination.
https://github.com/apache/nifi/blob/master/nifi-commons/nifi-site-to-site-client/src/main/java/org/apache/nifi/remote/client/PeerSelector.java#L159

To avoid having a single host address at S2S client configuration, you can use 
multiple ones delimited by commas.
With this, S2S client can connect when it's restarted even if node01 is down.
E.g. http://node01:8080,http://node02:8080,http://node03:8080

Alternatively, round robin DNS name or Reverse Proxy for the bootstrap node 
address can be used similarly.

Thanks,
Koji


On Fri, Sep 28, 2018 at 4:30 AM Peter Wicks (pwicks) <pwi...@micron.com> wrote:
>
> Hi NiFi team,
>
>
>
> We had one of the nodes in our cluster go offline today. We eventually 
> resolved the issue, but it exposed some issues in our configuration across 
> our edge NiFi instances.
>
>
>
> Right now we have non-clustered instances of NiFi distributed around the 
> world, pushing data back to a three node cluster via Site-to-Site. All of 
> these instances use the name of the first node (node01), and pull back the 
> peer list and weights from it. But node01 is the node that went offline 
> today, and while some site-to-site connections appeared to use cached data 
> and continued uploading data to node02 and node03, many of the site-to-site 
> connections went down because they were not able to pull the peer list from 
> the cluster, which makes perfect sense to me.
>
>
>
> One question that I was curious about, how long is a peer list cached for if 
> an updated list can’t be retrieved  from the cluster?
>
>
>
> What are the best practices for fixing this? We were throwing around ideas of 
> using a load balancer or round robin DNS name as the entry point for 
> site-to-site, but I figured others have probably already tackled this problem 
> before and could share some ideas.
>
>
>
> Thanks,
>
>   Peter

Reply via email to