Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Pieter Hameete Mon, 08 Feb 2016 08:51:27 -0800

After downloading and building the 1.0-SNAPSHOT from the master branch I do
run into another problem when starting a YARN cluster. The startup now
infinitely loops at the following step:


17:39:12,369 INFO
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider  - Failing
over to rm2
17:39:34,855 INFO
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider  - Failing
over to rm1

Any clue what couldve gone wrong? I used all-default for building with
maven.

- Pieter



2016-02-08 17:07 GMT+01:00 Pieter Hameete <phame...@gmail.com>:

> Matter of RTFM eh ;-) thx and sorry for the bother.
>
> 2016-02-08 17:06 GMT+01:00 Robert Metzger <rmetz...@apache.org>:
>
>> You said earlier that you are using Flink 0.10. The feature is only
>> available in 1.0-SNAPSHOT.
>>
>> On Mon, Feb 8, 2016 at 4:53 PM, Pieter Hameete <phame...@gmail.com>
>> wrote:
>>
>>> Ive tried setting the yarn.application-master.port property in
>>> flink-conf.yaml to a range suggested in
>>> https://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html#running-flink-on-yarn-behind-fi
>>> rewalls
>>>
>>> The JobManager does not seem to be picking the property up. Am I setting
>>> this in the wrong place? Or is there another way to enforce this property?
>>>
>>> Cheers,
>>>
>>> Pieter
>>>
>>> 2016-02-07 20:04 GMT+01:00 Pieter Hameete <phame...@gmail.com>:
>>>
>>>> I found the relevant information on the website. Ill consult with the
>>>> cluster admin tomorrow, thanks for the help :-)
>>>>
>>>> - Pieter
>>>>
>>>> 2016-02-07 19:31 GMT+01:00 Robert Metzger <rmetz...@apache.org>:
>>>>
>>>>> Hi,
>>>>>
>>>>> we had other users with a similar issue as well. There is a
>>>>> configuration value which allows you to specify a single port or a range 
>>>>> of
>>>>> ports for the JobManager to allocate when running on YARN.
>>>>> Note that when using this with a single port, the JMs may collide.
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Feb 7, 2016 at 7:25 PM, Pieter Hameete <phame...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Stephan,
>>>>>>
>>>>>> surely it seems this way! I must not be the first with this issue
>>>>>> though? I'll have to contact the cluster admins to find a solution
>>>>>> together. What would be a way of make the JobManagers accessible from
>>>>>> outside the network, because the IP and port number changes every time.
>>>>>>
>>>>>> Alternatively, I can ask for ssh access to a node within the network.
>>>>>> that will surely work but it's not my preferred solution.
>>>>>>
>>>>>> - Pieter
>>>>>>
>>>>>> 2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>:
>>>>>>
>>>>>>> Yeah, sounds a lot like the client cannot connect to the JobManager
>>>>>>> port.
>>>>>>>
>>>>>>> The ports to communicate with HDFS and the YARN resource manager may
>>>>>>> be whitelisted r forwarded, so you can submit the YARN session, but then
>>>>>>> not connect to the JobManager afterwards.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <phame...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Max!
>>>>>>>>
>>>>>>>> I'm using Flink 0.10.1 and indeed the cluster seems to be created
>>>>>>>> fine, all in the JobManager Web UI looks good.
>>>>>>>>
>>>>>>>> It seems like the JobManager initiates the connection with my VM
>>>>>>>> and cannot reach it. It could be that this is similar to the problem 
>>>>>>>> here:
>>>>>>>>
>>>>>>>>
>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>>>>>>>>
>>>>>>>> I probably have to make some changes to the networking
>>>>>>>> configuration of my VM so it can be reached by the JobManager despite 
>>>>>>>> using
>>>>>>>> a different port each time.
>>>>>>>>
>>>>>>>> - Pieter
>>>>>>>>
>>>>>>>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <m...@apache.org>:
>>>>>>>>
>>>>>>>>> Hi Pieter,
>>>>>>>>>
>>>>>>>>> Which version of Flink are you using? It appears you've created a
>>>>>>>>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Max
>>>>>>>>>
>>>>>>>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <phame...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> > Hi Robert,
>>>>>>>>> >
>>>>>>>>> > unfortunately there are no signs of what is going wrong in the
>>>>>>>>> logs. The
>>>>>>>>> > last log messages are about succesful registration of the
>>>>>>>>> TaskManagers.
>>>>>>>>> >
>>>>>>>>> > I'm also fairly sure it must be something in my VM that is
>>>>>>>>> causing this,
>>>>>>>>> > because when I start the yarn-session from a login node that is
>>>>>>>>> on the same
>>>>>>>>> > network as the hadoop cluster there are no problems registering
>>>>>>>>> with the
>>>>>>>>> > JobManager. I did also notice the following message in the local
>>>>>>>>> console:
>>>>>>>>> >
>>>>>>>>> > 12:30:27,173 WARN  Remoting
>>>>>>>>> > - Tried to associate with unreachable remote address
>>>>>>>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated
>>>>>>>>> for 5000 ms,
>>>>>>>>> > all messages to this address will be delivered to dead letters.
>>>>>>>>> Reason:
>>>>>>>>> > connection timed out: /145.100.41.13:41539
>>>>>>>>> >
>>>>>>>>> > I can ping the JobManager fine from with VM. Could there be some
>>>>>>>>> invalid or
>>>>>>>>> > missing configuration on my side?
>>>>>>>>> >
>>>>>>>>> > Cheers,
>>>>>>>>> >
>>>>>>>>> > Pieter
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rmetz...@apache.org>:
>>>>>>>>> >>
>>>>>>>>> >> Hi,
>>>>>>>>> >>
>>>>>>>>> >> did you check the logs of the JobManager itself? Maybe it'll
>>>>>>>>> tell us
>>>>>>>>> >> already whats going on.
>>>>>>>>> >>
>>>>>>>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <
>>>>>>>>> phame...@gmail.com>
>>>>>>>>> >> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Hi Guys!
>>>>>>>>> >>>
>>>>>>>>> >>> Im attempting to run Flink on YARN, but I run into an issue.
>>>>>>>>> Im starting
>>>>>>>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well
>>>>>>>>> until after
>>>>>>>>> >>> the JobManager web UI is started:
>>>>>>>>> >>>
>>>>>>>>> >>> JobManager web interface address
>>>>>>>>> >>>
>>>>>>>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>>>>>>>> >>> Waiting until all TaskManagers have connected
>>>>>>>>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>>> >>> - Notification about new leader address
>>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>>>>> session ID null.
>>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>>> Waiting ...
>>>>>>>>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>>> >>> - Received address of new leader
>>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>>>>> session ID null.
>>>>>>>>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>>> >>> - Disconnect from JobManager null.
>>>>>>>>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>>> >>> - Trying to register at JobManager
>>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>>> Waiting ...
>>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>>> Waiting ...
>>>>>>>>> >>>
>>>>>>>>> >>> It then hangs on these last steps (trying to register, no
>>>>>>>>> status
>>>>>>>>> >>> updates..)
>>>>>>>>> >>>
>>>>>>>>> >>> Im sure there must be a problem on my side that is causing me
>>>>>>>>> not to be
>>>>>>>>> >>> able to register at the JobManager. What could cause such
>>>>>>>>> connection
>>>>>>>>> >>> problems?
>>>>>>>>> >>>
>>>>>>>>> >>> Any tips are very welcome :-)
>>>>>>>>> >>>
>>>>>>>>> >>> Cheers and have a good weekend!
>>>>>>>>> >>>
>>>>>>>>> >>> - Pieter
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Reply via email to