After downloading and building the 1.0-SNAPSHOT from the master branch I do run into another problem when starting a YARN cluster. The startup now infinitely loops at the following step:
17:39:12,369 INFO org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm2 17:39:34,855 INFO org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm1 Any clue what couldve gone wrong? I used all-default for building with maven. - Pieter 2016-02-08 17:07 GMT+01:00 Pieter Hameete <phame...@gmail.com>: > Matter of RTFM eh ;-) thx and sorry for the bother. > > 2016-02-08 17:06 GMT+01:00 Robert Metzger <rmetz...@apache.org>: > >> You said earlier that you are using Flink 0.10. The feature is only >> available in 1.0-SNAPSHOT. >> >> On Mon, Feb 8, 2016 at 4:53 PM, Pieter Hameete <phame...@gmail.com> >> wrote: >> >>> Ive tried setting the yarn.application-master.port property in >>> flink-conf.yaml to a range suggested in >>> https://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html#running-flink-on-yarn-behind-fi >>> rewalls >>> >>> The JobManager does not seem to be picking the property up. Am I setting >>> this in the wrong place? Or is there another way to enforce this property? >>> >>> Cheers, >>> >>> Pieter >>> >>> 2016-02-07 20:04 GMT+01:00 Pieter Hameete <phame...@gmail.com>: >>> >>>> I found the relevant information on the website. Ill consult with the >>>> cluster admin tomorrow, thanks for the help :-) >>>> >>>> - Pieter >>>> >>>> 2016-02-07 19:31 GMT+01:00 Robert Metzger <rmetz...@apache.org>: >>>> >>>>> Hi, >>>>> >>>>> we had other users with a similar issue as well. There is a >>>>> configuration value which allows you to specify a single port or a range >>>>> of >>>>> ports for the JobManager to allocate when running on YARN. >>>>> Note that when using this with a single port, the JMs may collide. >>>>> >>>>> >>>>> >>>>> On Sun, Feb 7, 2016 at 7:25 PM, Pieter Hameete <phame...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Stephan, >>>>>> >>>>>> surely it seems this way! I must not be the first with this issue >>>>>> though? I'll have to contact the cluster admins to find a solution >>>>>> together. What would be a way of make the JobManagers accessible from >>>>>> outside the network, because the IP and port number changes every time. >>>>>> >>>>>> Alternatively, I can ask for ssh access to a node within the network. >>>>>> that will surely work but it's not my preferred solution. >>>>>> >>>>>> - Pieter >>>>>> >>>>>> 2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>: >>>>>> >>>>>>> Yeah, sounds a lot like the client cannot connect to the JobManager >>>>>>> port. >>>>>>> >>>>>>> The ports to communicate with HDFS and the YARN resource manager may >>>>>>> be whitelisted r forwarded, so you can submit the YARN session, but then >>>>>>> not connect to the JobManager afterwards. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <phame...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Max! >>>>>>>> >>>>>>>> I'm using Flink 0.10.1 and indeed the cluster seems to be created >>>>>>>> fine, all in the JobManager Web UI looks good. >>>>>>>> >>>>>>>> It seems like the JobManager initiates the connection with my VM >>>>>>>> and cannot reach it. It could be that this is similar to the problem >>>>>>>> here: >>>>>>>> >>>>>>>> >>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html >>>>>>>> >>>>>>>> I probably have to make some changes to the networking >>>>>>>> configuration of my VM so it can be reached by the JobManager despite >>>>>>>> using >>>>>>>> a different port each time. >>>>>>>> >>>>>>>> - Pieter >>>>>>>> >>>>>>>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <m...@apache.org>: >>>>>>>> >>>>>>>>> Hi Pieter, >>>>>>>>> >>>>>>>>> Which version of Flink are you using? It appears you've created a >>>>>>>>> Flink YARN cluster but you can't reach the JobManager afterwards. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Max >>>>>>>>> >>>>>>>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <phame...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> > Hi Robert, >>>>>>>>> > >>>>>>>>> > unfortunately there are no signs of what is going wrong in the >>>>>>>>> logs. The >>>>>>>>> > last log messages are about succesful registration of the >>>>>>>>> TaskManagers. >>>>>>>>> > >>>>>>>>> > I'm also fairly sure it must be something in my VM that is >>>>>>>>> causing this, >>>>>>>>> > because when I start the yarn-session from a login node that is >>>>>>>>> on the same >>>>>>>>> > network as the hadoop cluster there are no problems registering >>>>>>>>> with the >>>>>>>>> > JobManager. I did also notice the following message in the local >>>>>>>>> console: >>>>>>>>> > >>>>>>>>> > 12:30:27,173 WARN Remoting >>>>>>>>> > - Tried to associate with unreachable remote address >>>>>>>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated >>>>>>>>> for 5000 ms, >>>>>>>>> > all messages to this address will be delivered to dead letters. >>>>>>>>> Reason: >>>>>>>>> > connection timed out: /145.100.41.13:41539 >>>>>>>>> > >>>>>>>>> > I can ping the JobManager fine from with VM. Could there be some >>>>>>>>> invalid or >>>>>>>>> > missing configuration on my side? >>>>>>>>> > >>>>>>>>> > Cheers, >>>>>>>>> > >>>>>>>>> > Pieter >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rmetz...@apache.org>: >>>>>>>>> >> >>>>>>>>> >> Hi, >>>>>>>>> >> >>>>>>>>> >> did you check the logs of the JobManager itself? Maybe it'll >>>>>>>>> tell us >>>>>>>>> >> already whats going on. >>>>>>>>> >> >>>>>>>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete < >>>>>>>>> phame...@gmail.com> >>>>>>>>> >> wrote: >>>>>>>>> >>> >>>>>>>>> >>> Hi Guys! >>>>>>>>> >>> >>>>>>>>> >>> Im attempting to run Flink on YARN, but I run into an issue. >>>>>>>>> Im starting >>>>>>>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well >>>>>>>>> until after >>>>>>>>> >>> the JobManager web UI is started: >>>>>>>>> >>> >>>>>>>>> >>> JobManager web interface address >>>>>>>>> >>> >>>>>>>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/ >>>>>>>>> >>> Waiting until all TaskManagers have connected >>>>>>>>> >>> 11:09:51,557 INFO org.apache.flink.yarn.ApplicationClient >>>>>>>>> >>> - Notification about new leader address >>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with >>>>>>>>> session ID null. >>>>>>>>> >>> No status updates from the YARN cluster received so far. >>>>>>>>> Waiting ... >>>>>>>>> >>> 11:09:51,578 INFO org.apache.flink.yarn.ApplicationClient >>>>>>>>> >>> - Received address of new leader >>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with >>>>>>>>> session ID null. >>>>>>>>> >>> 11:09:51,583 INFO org.apache.flink.yarn.ApplicationClient >>>>>>>>> >>> - Disconnect from JobManager null. >>>>>>>>> >>> 11:09:51,595 INFO org.apache.flink.yarn.ApplicationClient >>>>>>>>> >>> - Trying to register at JobManager >>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager. >>>>>>>>> >>> No status updates from the YARN cluster received so far. >>>>>>>>> Waiting ... >>>>>>>>> >>> No status updates from the YARN cluster received so far. >>>>>>>>> Waiting ... >>>>>>>>> >>> >>>>>>>>> >>> It then hangs on these last steps (trying to register, no >>>>>>>>> status >>>>>>>>> >>> updates..) >>>>>>>>> >>> >>>>>>>>> >>> Im sure there must be a problem on my side that is causing me >>>>>>>>> not to be >>>>>>>>> >>> able to register at the JobManager. What could cause such >>>>>>>>> connection >>>>>>>>> >>> problems? >>>>>>>>> >>> >>>>>>>>> >>> Any tips are very welcome :-) >>>>>>>>> >>> >>>>>>>>> >>> Cheers and have a good weekend! >>>>>>>>> >>> >>>>>>>>> >>> - Pieter >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >> >>>>>>>>> > >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >