Re: flink:latest container on kubernetes fails to connect taskmanager to jobmanager

2018-10-01 Thread jwatte
It turns out that the latest flink:latest docker image is 5 days old, and
thus bug was fixed 4 days ago in the flink-docker github.

The problem is that the docker-entrypoint.sh script chains to jobmanager.sh
by saying "start-foreground cluster" where the "cluster" argument is
obsolete as of Flink 1.5.

I patched it with a sed command in the Kubernetes manifest, until the
updated docker image makes it way to the world.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


flink:latest container on kubernetes fails to connect taskmanager to jobmanager

2018-10-01 Thread jwatte
I'm using the standard Kubernetes deploy configs for jobmanager and
taskmanager deployments, and jobmanager service.
However, when the task managers start up, they try to register with the job
manager over Akka on port 6123.
This fails, because the Akka on the jobmanager discards those messages as
"non-local."

The taskmanager keeps repeating this log message and eventually existing
(and getting restarted by Kubernetes):

2018-10-01 20:08:28,365 INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor- Could not
resolve ResourceManager address
akka.tcp://flink@flink-jobmanager:6123/user/resourcemanager, retrying in
1 ms: Ask timed out on
[ActorSelection[Anchor(akka.tcp://flink@flink-jobmanager:6123/),
Path(/user/resourcemanager)]] after [1 ms]. Sender[null] sent message of
type "akka.actor.Identify"..

The jobmanager responds with this log message:

2018-10-01 20:09:38,475 ERROR akka.remote.EndpointWriter
   
- dropping message [class akka.actor.ActorSelectionMessage] for non-local
recipient [Actor[akka.tcp://flink@flink-jobmanager:6123/]] arriving at
[akka.tcp://flink@flink-jobmanager:6123] inbound addresses are
[akka.tcp://flink@cluster:6123]

I have verified that network connectivity exists, so this is some
configuration problem.
I notice that the docker-entrypoint.sh edits the config files and calls the
taskmanager.sh / jobmanager.sh scripts based on start mode.
Is this file editing the config file wrong? What needs to be done so that
Akka on the jobmanager accepts the registration messages?




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/