To add to my post, instead of using POD IP for the `jobmanager.rpc.address`
configuration we start each JM pod with the Fully Qualified Name `--host
<pod-name>.<stateful-set-name>.ns.svc:8081`  and this address gets persisted
to the ConfigMaps. In some scenarios, the leader address in the ConfigMaps
might differ. 

For example, let's assume I have 3 JMs:

jm-0.jm-statefulset.ns.svc:8081 <-- Leader
jm-1.jm-statefulset.ns.svc:8081
jm-2.jm-statefulset..ns.svc:8081

I have seen the ConfigMaps in the following state:

RestServer Configmap Address: jm-0.jm-statefulset.ns.svc:8081
DispatchServer Configmap Address: jm-1.jm-statefulset.ns.svc:8081
ResourceManager ConfigMap Address: jm-0.jm-statefulset.ns.svc:8081 

Is this the correct behaviour?

I then have seen that the TM pods fail to connect due to 

```
java.util.concurrent.CompletionException:
org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token
not set: Ignoring message
RemoteFencedMessage(b870874c1c590d593178811f052a42c9,
RemoteRpcInvocation(registerTaskExecutor(TaskExecutorRegistration, Time)))
sent to
akka.tcp://fl...@jm-1.jm-statefulset.ns.svc:6123/user/rpc/resourcemanager_0
because the fencing token is null.
```

This is explained by Till
https://issues.apache.org/jira/browse/FLINK-18367?focusedCommentId=17141070&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17141070

Has anyone else seen this?

Thanks!

Enrique



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to