Anybody else seen this and know the solution?  We're dead in the water with
Flink 1.5.4.

On Sun, Sep 23, 2018 at 11:46 PM alex <ek.rei...@gmail.com> wrote:

> We started to see same errors after upgrading to flink 1.6.0 from 1.4.2. We
> have one JM and 5 TM on kubernetes. JM is running on HA mode. Taskmanagers
> sometimes are loosing connection to JM and having following error like you
> have.
>
> *2018-09-19 12:36:40,687 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address
> akka.tcp://flink@flink-jobmanager:50002/user/resourcemanager, retrying in
> 10000 ms: Ask timed out on
> [ActorSelection[Anchor(akka.tcp://flink@flink-jobmanager:50002/),
> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message
> of
> type "akka.actor.Identify"..*
>
> When TM started to have "Could not resolve ResourceManager", it cannot
> resolve itself until I restart the TM pod.
>
> *Here is the content of our flink-conf.yaml:*
> blob.server.port: 6124
> jobmanager.rpc.address: flink-jobmanager
> jobmanager.rpc.port: 6123
> jobmanager.heap.mb: 4096
> jobmanager.web.history: 20
> jobmanager.archive.fs.dir: s3://our_path
> taskmanager.rpc.port: 6121
> taskmanager.heap.mb: 16384
> taskmanager.numberOfTaskSlots: 10
> taskmanager.log.path: /opt/flink/log/output.log
> web.log.path: /opt/flink/log/output.log
> state.checkpoints.num-retained: 3
> metrics.reporters: prom
> metrics.reporter.prom.class:
> org.apache.flink.metrics.prometheus.PrometheusReporter
>
> high-availability: zookeeper
> high-availability.jobmanager.port: 50002
> high-availability.zookeeper.quorum: zookeeper_instance_list
> high-availability.zookeeper.path.root: /flink
> high-availability.cluster-id: profileservice
> high-availability.storageDir: s3://our_path
>
> Any help will be greatly appreciated!
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Reply via email to