Hi Aleksandar,

The resource manager address is retrieved from the HA services.
Would you check whether your customized HA services is returning the right
LeaderRetrievalService and whether the LeaderRetrievalService is really
retrieving the right leader's address?
Or is it possible that the stored resource manager address in HA is
replaced by jobmanager address in any case?

Thanks,
Zhu Zhu

Aleksandar Mastilovic <amastilo...@sightmachine.com> 于2019年8月22日周四 上午8:16写道:

> Hi all,
>
> I’m experimenting with using my own implementation of HA services instead
> of ZooKeeper that would persist JobManager information on a Kubernetes
> volume instead of in ZooKeeper.
>
> I’ve set the high-availability option in flink-conf.yaml to the FQN of my
> factory class, and started the docker ensemble as I usually do (i.e. with
> no special “cluster” arguments or scripts.)
>
> What’s happening now is that TaskManager is unable to connect to
> ResourceManager, because it seems it’s trying to use the /user/jobmanager
> path instead of /user/resourcemanager.
>
> Here’s what I found in the logs:
>
>
> jobmanager_1    | 2019-08-22 00:05:00,963 INFO  akka.remote.Remoting
>                                    - Remoting started; listening on
> addresses :[akka.tcp://flink@jobmanager:6123]
> jobmanager_1    | 2019-08-22 00:05:00,975 INFO
>  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Actor
> system started at akka.tcp://flink@jobmanager:6123
>
> jobmanager_1    | 2019-08-22 00:05:02,380 INFO
>  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
> RPC endpoint for
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at
> akka://flink/user/resourcemanager .
>
> jobmanager_1    | 2019-08-22 00:05:03,138 INFO
>  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
> RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher
> at akka://flink/user/dispatcher .
>
> jobmanager_1    | 2019-08-22 00:05:03,211 INFO
>  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  -
> ResourceManager akka.tcp://flink@jobmanager:6123/user/resourcemanager was
> granted leadership with fencing token 00000000000000000000000000000000
>
> jobmanager_1    | 2019-08-22 00:05:03,292 INFO
>  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher
> akka.tcp://flink@jobmanager:6123/user/dispatcher was granted leadership
> with fencing token 00000000-0000-0000-0000-000000000000
>
> taskmanager_1   | 2019-08-22 00:05:03,713 INFO
>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting
> to ResourceManager
> akka.tcp://flink@jobmanager:6123/user/jobmanager(00000000000000000000000000000000)
> .
> taskmanager_1   | 2019-08-22 00:05:04,137 INFO
>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address
> akka.tcp://flink@jobmanager:6123/user/jobmanager, retrying in 10000 ms:
> Could not connect to rpc endpoint under address
> akka.tcp://flink@jobmanager:6123/user/jobmanager..
>
> Is this a known bug? I’d appreciate any help I can get.
>
> Thanks,
> Aleksandar Mastilovic
>

Reply via email to