I get the following error when trying to savepoint a job for example:

 The program finished with the following exception:

org.apache.flink.util.FlinkException: Could not connect to the leading
JobManager. Please check that the JobManager is running.
at
org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:960)
at
org.apache.flink.client.program.ClusterClient.triggerSavepoint(ClusterClient.java:737)
at
org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:771)
at
org.apache.flink.client.cli.CliFrontend.lambda$checkpoint$10(CliFrontend.java:760)
at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1044)
at org.apache.flink.client.cli.CliFrontend.checkpoint(CliFrontend.java:759)
at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1127)
at
org.apache.flink.client.cli.CliFrontend.lambda$main$12(CliFrontend.java:1188)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1188)
Caused by:
org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could
not retrieve the leader gateway.
at
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:83)
at
org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:955)
... 12 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[20000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at scala.concurrent.Await.result(package.scala)
at
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:81)
... 13 more

No error when trying the same operation with the 1.7 client on an 1.6
(legacy execution) job. This looks like a firewall issue so im trying to
fix the ports to the open ranges but not sure what I have to change.

Gyula

Gyula Fóra <gyula.f...@gmail.com> ezt írta (időpont: 2018. dec. 4., K,
15:11):

> Hi!
>
> We have been running Flink on Yarn for quite some time and historically we
> specified port ranges so that the client can access the cluster:
>
> yarn.application-master.port: 100-200
>
> Now we updated to flink 1.7 and try to migrate away from the legacy
> execution mode but we run into a problem that we cannot connect to the
> running job from the command line client like before.
>
> What is the equivalent port config that would make sure that ports that
> are needed to be accessible from the client land between 100 and 200?
>
> Thanks,
> Gyula
>

Reply via email to