Hi Stephan, surely it seems this way! I must not be the first with this issue though? I'll have to contact the cluster admins to find a solution together. What would be a way of make the JobManagers accessible from outside the network, because the IP and port number changes every time.
Alternatively, I can ask for ssh access to a node within the network. that will surely work but it's not my preferred solution. - Pieter 2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>: > Yeah, sounds a lot like the client cannot connect to the JobManager port. > > The ports to communicate with HDFS and the YARN resource manager may be > whitelisted r forwarded, so you can submit the YARN session, but then not > connect to the JobManager afterwards. > > > > On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <phame...@gmail.com> wrote: > >> Hi Max! >> >> I'm using Flink 0.10.1 and indeed the cluster seems to be created fine, >> all in the JobManager Web UI looks good. >> >> It seems like the JobManager initiates the connection with my VM and >> cannot reach it. It could be that this is similar to the problem here: >> >> >> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html >> >> I probably have to make some changes to the networking configuration of >> my VM so it can be reached by the JobManager despite using a different port >> each time. >> >> - Pieter >> >> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <m...@apache.org>: >> >>> Hi Pieter, >>> >>> Which version of Flink are you using? It appears you've created a >>> Flink YARN cluster but you can't reach the JobManager afterwards. >>> >>> Cheers, >>> Max >>> >>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <phame...@gmail.com> >>> wrote: >>> > Hi Robert, >>> > >>> > unfortunately there are no signs of what is going wrong in the logs. >>> The >>> > last log messages are about succesful registration of the TaskManagers. >>> > >>> > I'm also fairly sure it must be something in my VM that is causing >>> this, >>> > because when I start the yarn-session from a login node that is on the >>> same >>> > network as the hadoop cluster there are no problems registering with >>> the >>> > JobManager. I did also notice the following message in the local >>> console: >>> > >>> > 12:30:27,173 WARN Remoting >>> > - Tried to associate with unreachable remote address >>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for 5000 >>> ms, >>> > all messages to this address will be delivered to dead letters. Reason: >>> > connection timed out: /145.100.41.13:41539 >>> > >>> > I can ping the JobManager fine from with VM. Could there be some >>> invalid or >>> > missing configuration on my side? >>> > >>> > Cheers, >>> > >>> > Pieter >>> > >>> > >>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rmetz...@apache.org>: >>> >> >>> >> Hi, >>> >> >>> >> did you check the logs of the JobManager itself? Maybe it'll tell us >>> >> already whats going on. >>> >> >>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <phame...@gmail.com> >>> >> wrote: >>> >>> >>> >>> Hi Guys! >>> >>> >>> >>> Im attempting to run Flink on YARN, but I run into an issue. Im >>> starting >>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well until >>> after >>> >>> the JobManager web UI is started: >>> >>> >>> >>> JobManager web interface address >>> >>> >>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/ >>> >>> Waiting until all TaskManagers have connected >>> >>> 11:09:51,557 INFO org.apache.flink.yarn.ApplicationClient >>> >>> - Notification about new leader address >>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session >>> ID null. >>> >>> No status updates from the YARN cluster received so far. Waiting ... >>> >>> 11:09:51,578 INFO org.apache.flink.yarn.ApplicationClient >>> >>> - Received address of new leader >>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session >>> ID null. >>> >>> 11:09:51,583 INFO org.apache.flink.yarn.ApplicationClient >>> >>> - Disconnect from JobManager null. >>> >>> 11:09:51,595 INFO org.apache.flink.yarn.ApplicationClient >>> >>> - Trying to register at JobManager >>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager. >>> >>> No status updates from the YARN cluster received so far. Waiting ... >>> >>> No status updates from the YARN cluster received so far. Waiting ... >>> >>> >>> >>> It then hangs on these last steps (trying to register, no status >>> >>> updates..) >>> >>> >>> >>> Im sure there must be a problem on my side that is causing me not to >>> be >>> >>> able to register at the JobManager. What could cause such connection >>> >>> problems? >>> >>> >>> >>> Any tips are very welcome :-) >>> >>> >>> >>> Cheers and have a good weekend! >>> >>> >>> >>> - Pieter >>> >>> >>> >>> >>> >> >>> > >>> >> >> >