Re: TaskManager not connecting to ResourceManager in HA mode

Zili Chen Wed, 21 Aug 2019 19:17:44 -0700

Hi Aleksandar,

base on your log:


taskmanager_1   | 2019-08-22 00:05:03,713 INFO
 org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting
to ResourceManager
akka.tcp://flink@jobmanager:6123/user/jobmanager(00000000000000000000000000000000)
.
taskmanager_1   | 2019-08-22 00:05:04,137 INFO
 org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
resolve ResourceManager address
akka.tcp://flink@jobmanager:6123/user/jobmanager, retrying in 10000 ms:
Could not connect to rpc endpoint under address
akka.tcp://flink@jobmanager:6123/user/jobmanager..

it looks like you return a jobmanager address on retrieval service of
resource manager. Please check the implementation carefully or share it on
mailing list that others can help for investigation.

Best,
tison.


Zhu Zhu <reed...@gmail.com> 于2019年8月22日周四 上午10:11写道：

> Hi Aleksandar,
>
> The resource manager address is retrieved from the HA services.
> Would you check whether your customized HA services is returning the
> right  LeaderRetrievalService and whether the LeaderRetrievalService is
> really retrieving the right leader's address?
> Or is it possible that the stored resource manager address in HA is
> replaced by jobmanager address in any case?
>
> Thanks,
> Zhu Zhu
>
> Aleksandar Mastilovic <amastilo...@sightmachine.com> 于2019年8月22日周四
> 上午8:16写道：
>
>> Hi all,
>>
>> I’m experimenting with using my own implementation of HA services instead
>> of ZooKeeper that would persist JobManager information on a Kubernetes
>> volume instead of in ZooKeeper.
>>
>> I’ve set the high-availability option in flink-conf.yaml to the FQN of my
>> factory class, and started the docker ensemble as I usually do (i.e. with
>> no special “cluster” arguments or scripts.)
>>
>> What’s happening now is that TaskManager is unable to connect to
>> ResourceManager, because it seems it’s trying to use the /user/jobmanager
>> path instead of /user/resourcemanager.
>>
>> Here’s what I found in the logs:
>>
>>
>> jobmanager_1    | 2019-08-22 00:05:00,963 INFO  akka.remote.Remoting
>>                                      - Remoting started; listening on
>> addresses :[akka.tcp://flink@jobmanager:6123]
>> jobmanager_1    | 2019-08-22 00:05:00,975 INFO
>>  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Actor
>> system started at akka.tcp://flink@jobmanager:6123
>>
>> jobmanager_1    | 2019-08-22 00:05:02,380 INFO
>>  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
>> RPC endpoint for
>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at
>> akka://flink/user/resourcemanager .
>>
>> jobmanager_1    | 2019-08-22 00:05:03,138 INFO
>>  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
>> RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher
>> at akka://flink/user/dispatcher .
>>
>> jobmanager_1    | 2019-08-22 00:05:03,211 INFO
>>  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  -
>> ResourceManager akka.tcp://flink@jobmanager:6123/user/resourcemanager
>> was granted leadership with fencing token 00000000000000000000000000000000
>>
>> jobmanager_1    | 2019-08-22 00:05:03,292 INFO
>>  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher
>> akka.tcp://flink@jobmanager:6123/user/dispatcher was granted leadership
>> with fencing token 00000000-0000-0000-0000-000000000000
>>
>> taskmanager_1   | 2019-08-22 00:05:03,713 INFO
>>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting
>> to ResourceManager
>> akka.tcp://flink@jobmanager:6123/user/jobmanager(00000000000000000000000000000000)
>> .
>> taskmanager_1   | 2019-08-22 00:05:04,137 INFO
>>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
>> resolve ResourceManager address
>> akka.tcp://flink@jobmanager:6123/user/jobmanager, retrying in 10000 ms:
>> Could not connect to rpc endpoint under address
>> akka.tcp://flink@jobmanager:6123/user/jobmanager..
>>
>> Is this a known bug? I’d appreciate any help I can get.
>>
>> Thanks,
>> Aleksandar Mastilovic
>>
>

Re: TaskManager not connecting to ResourceManager in HA mode

Reply via email to