Anybody else seen this and know the solution? We're dead in the water with Flink 1.5.4.
On Sun, Sep 23, 2018 at 11:46 PM alex <ek.rei...@gmail.com> wrote: > We started to see same errors after upgrading to flink 1.6.0 from 1.4.2. We > have one JM and 5 TM on kubernetes. JM is running on HA mode. Taskmanagers > sometimes are loosing connection to JM and having following error like you > have. > > *2018-09-19 12:36:40,687 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not > resolve ResourceManager address > akka.tcp://flink@flink-jobmanager:50002/user/resourcemanager, retrying in > 10000 ms: Ask timed out on > [ActorSelection[Anchor(akka.tcp://flink@flink-jobmanager:50002/), > Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message > of > type "akka.actor.Identify"..* > > When TM started to have "Could not resolve ResourceManager", it cannot > resolve itself until I restart the TM pod. > > *Here is the content of our flink-conf.yaml:* > blob.server.port: 6124 > jobmanager.rpc.address: flink-jobmanager > jobmanager.rpc.port: 6123 > jobmanager.heap.mb: 4096 > jobmanager.web.history: 20 > jobmanager.archive.fs.dir: s3://our_path > taskmanager.rpc.port: 6121 > taskmanager.heap.mb: 16384 > taskmanager.numberOfTaskSlots: 10 > taskmanager.log.path: /opt/flink/log/output.log > web.log.path: /opt/flink/log/output.log > state.checkpoints.num-retained: 3 > metrics.reporters: prom > metrics.reporter.prom.class: > org.apache.flink.metrics.prometheus.PrometheusReporter > > high-availability: zookeeper > high-availability.jobmanager.port: 50002 > high-availability.zookeeper.quorum: zookeeper_instance_list > high-availability.zookeeper.path.root: /flink > high-availability.cluster-id: profileservice > high-availability.storageDir: s3://our_path > > Any help will be greatly appreciated! > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >