Thanks for all the help, people - you made me go through my code once again and discover that I switched argument positions for job manager and resource manager addresses :-)
The docker ensemble now starts fine, I’m working on ironing out the bugs now. I’ll participate in the survey too! > On Aug 21, 2019, at 7:18 PM, Zili Chen <wander4...@gmail.com> wrote: > > Besides, would you like to participant our survey thread[1] on > user list about "How do you use high-availability services in Flink?" > > It would help Flink improve its high-availability serving. > > Best, > tison. > > [1] > https://lists.apache.org/x/thread.html/c0cc07197e6ba30b45d7709cc9e17d8497e5e3f33de504d58dfcafad@%3Cuser.flink.apache.org%3E > > <https://lists.apache.org/x/thread.html/c0cc07197e6ba30b45d7709cc9e17d8497e5e3f33de504d58dfcafad@%3Cuser.flink.apache.org%3E> > > Zili Chen <wander4...@gmail.com <mailto:wander4...@gmail.com>> 于2019年8月22日周四 > 上午10:16写道: > Hi Aleksandar, > > base on your log: > > taskmanager_1 | 2019-08-22 00:05:03,713 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Connecting to > ResourceManager > akka.tcp://flink@jobmanager:6123/user/jobmanager(00000000000000000000000000000000) > <>. > taskmanager_1 | 2019-08-22 00:05:04,137 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not > resolve ResourceManager address > akka.tcp://flink@jobmanager:6123/user/jobmanager <>, retrying in 10000 ms: > Could not connect to rpc endpoint under address > akka.tcp://flink@jobmanager:6123/user/jobmanager <>.. > > it looks like you return a jobmanager address on retrieval service of > resource manager. Please check the implementation carefully or share it on > mailing list that others can help for investigation. > > Best, > tison. > > > Zhu Zhu <reed...@gmail.com <mailto:reed...@gmail.com>> 于2019年8月22日周四 > 上午10:11写道: > Hi Aleksandar, > > The resource manager address is retrieved from the HA services. > Would you check whether your customized HA services is returning the right > LeaderRetrievalService and whether the LeaderRetrievalService is really > retrieving the right leader's address? > Or is it possible that the stored resource manager address in HA is replaced > by jobmanager address in any case? > > Thanks, > Zhu Zhu > > Aleksandar Mastilovic <amastilo...@sightmachine.com > <mailto:amastilo...@sightmachine.com>> 于2019年8月22日周四 上午8:16写道: > Hi all, > > I’m experimenting with using my own implementation of HA services instead of > ZooKeeper that would persist JobManager information on a Kubernetes volume > instead of in ZooKeeper. > > I’ve set the high-availability option in flink-conf.yaml to the FQN of my > factory class, and started the docker ensemble as I usually do (i.e. with no > special “cluster” arguments or scripts.) > > What’s happening now is that TaskManager is unable to connect to > ResourceManager, because it seems it’s trying to use the /user/jobmanager > path instead of /user/resourcemanager. > > Here’s what I found in the logs: > > > jobmanager_1 | 2019-08-22 00:05:00,963 INFO akka.remote.Remoting > - Remoting started; listening on addresses > :[akka.tcp://flink@jobmanager:6123 <>] > jobmanager_1 | 2019-08-22 00:05:00,975 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor system > started at akka.tcp://flink@jobmanager:6123 <> > > jobmanager_1 | 2019-08-22 00:05:02,380 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC > endpoint for > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at > akka://flink/user/resourcemanager <> . > > jobmanager_1 | 2019-08-22 00:05:03,138 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC > endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at > akka://flink/user/dispatcher <> . > > jobmanager_1 | 2019-08-22 00:05:03,211 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - > ResourceManager akka.tcp://flink@jobmanager:6123/user/resourcemanager <> was > granted leadership with fencing token 00000000000000000000000000000000 > > jobmanager_1 | 2019-08-22 00:05:03,292 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher > akka.tcp://flink@jobmanager:6123/user/dispatcher <> was granted leadership > with fencing token 00000000-0000-0000-0000-000000000000 > > taskmanager_1 | 2019-08-22 00:05:03,713 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Connecting to > ResourceManager > akka.tcp://flink@jobmanager:6123/user/jobmanager(00000000000000000000000000000000) > <>. > taskmanager_1 | 2019-08-22 00:05:04,137 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not > resolve ResourceManager address > akka.tcp://flink@jobmanager:6123/user/jobmanager <>, retrying in 10000 ms: > Could not connect to rpc endpoint under address > akka.tcp://flink@jobmanager:6123/user/jobmanager <>.. > > Is this a known bug? I’d appreciate any help I can get. > > Thanks, > Aleksandar Mastilovic