Hi, we had other users with a similar issue as well. There is a configuration value which allows you to specify a single port or a range of ports for the JobManager to allocate when running on YARN. Note that when using this with a single port, the JMs may collide.
On Sun, Feb 7, 2016 at 7:25 PM, Pieter Hameete <phame...@gmail.com> wrote: > Hi Stephan, > > surely it seems this way! I must not be the first with this issue though? > I'll have to contact the cluster admins to find a solution together. What > would be a way of make the JobManagers accessible from outside the network, > because the IP and port number changes every time. > > Alternatively, I can ask for ssh access to a node within the network. that > will surely work but it's not my preferred solution. > > - Pieter > > 2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>: > >> Yeah, sounds a lot like the client cannot connect to the JobManager port. >> >> The ports to communicate with HDFS and the YARN resource manager may be >> whitelisted r forwarded, so you can submit the YARN session, but then not >> connect to the JobManager afterwards. >> >> >> >> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <phame...@gmail.com> >> wrote: >> >>> Hi Max! >>> >>> I'm using Flink 0.10.1 and indeed the cluster seems to be created fine, >>> all in the JobManager Web UI looks good. >>> >>> It seems like the JobManager initiates the connection with my VM and >>> cannot reach it. It could be that this is similar to the problem here: >>> >>> >>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html >>> >>> I probably have to make some changes to the networking configuration of >>> my VM so it can be reached by the JobManager despite using a different port >>> each time. >>> >>> - Pieter >>> >>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <m...@apache.org>: >>> >>>> Hi Pieter, >>>> >>>> Which version of Flink are you using? It appears you've created a >>>> Flink YARN cluster but you can't reach the JobManager afterwards. >>>> >>>> Cheers, >>>> Max >>>> >>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <phame...@gmail.com> >>>> wrote: >>>> > Hi Robert, >>>> > >>>> > unfortunately there are no signs of what is going wrong in the logs. >>>> The >>>> > last log messages are about succesful registration of the >>>> TaskManagers. >>>> > >>>> > I'm also fairly sure it must be something in my VM that is causing >>>> this, >>>> > because when I start the yarn-session from a login node that is on >>>> the same >>>> > network as the hadoop cluster there are no problems registering with >>>> the >>>> > JobManager. I did also notice the following message in the local >>>> console: >>>> > >>>> > 12:30:27,173 WARN Remoting >>>> > - Tried to associate with unreachable remote address >>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for >>>> 5000 ms, >>>> > all messages to this address will be delivered to dead letters. >>>> Reason: >>>> > connection timed out: /145.100.41.13:41539 >>>> > >>>> > I can ping the JobManager fine from with VM. Could there be some >>>> invalid or >>>> > missing configuration on my side? >>>> > >>>> > Cheers, >>>> > >>>> > Pieter >>>> > >>>> > >>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rmetz...@apache.org>: >>>> >> >>>> >> Hi, >>>> >> >>>> >> did you check the logs of the JobManager itself? Maybe it'll tell us >>>> >> already whats going on. >>>> >> >>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <phame...@gmail.com> >>>> >> wrote: >>>> >>> >>>> >>> Hi Guys! >>>> >>> >>>> >>> Im attempting to run Flink on YARN, but I run into an issue. Im >>>> starting >>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well until >>>> after >>>> >>> the JobManager web UI is started: >>>> >>> >>>> >>> JobManager web interface address >>>> >>> >>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/ >>>> >>> Waiting until all TaskManagers have connected >>>> >>> 11:09:51,557 INFO org.apache.flink.yarn.ApplicationClient >>>> >>> - Notification about new leader address >>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session >>>> ID null. >>>> >>> No status updates from the YARN cluster received so far. Waiting ... >>>> >>> 11:09:51,578 INFO org.apache.flink.yarn.ApplicationClient >>>> >>> - Received address of new leader >>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session >>>> ID null. >>>> >>> 11:09:51,583 INFO org.apache.flink.yarn.ApplicationClient >>>> >>> - Disconnect from JobManager null. >>>> >>> 11:09:51,595 INFO org.apache.flink.yarn.ApplicationClient >>>> >>> - Trying to register at JobManager >>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager. >>>> >>> No status updates from the YARN cluster received so far. Waiting ... >>>> >>> No status updates from the YARN cluster received so far. Waiting ... >>>> >>> >>>> >>> It then hangs on these last steps (trying to register, no status >>>> >>> updates..) >>>> >>> >>>> >>> Im sure there must be a problem on my side that is causing me not >>>> to be >>>> >>> able to register at the JobManager. What could cause such connection >>>> >>> problems? >>>> >>> >>>> >>> Any tips are very welcome :-) >>>> >>> >>>> >>> Cheers and have a good weekend! >>>> >>> >>>> >>> - Pieter >>>> >>> >>>> >>> >>>> >> >>>> > >>>> >>> >>> >> >