Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Pieter Hameete Mon, 08 Feb 2016 08:09:20 -0800

Ive tried setting the yarn.application-master.port property in
flink-conf.yaml to a range suggested in
https://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html#running-flink-on-yarn-behind-fi
rewalls


The JobManager does not seem to be picking the property up. Am I setting
this in the wrong place? Or is there another way to enforce this property?

Cheers,

Pieter

2016-02-07 20:04 GMT+01:00 Pieter Hameete <phame...@gmail.com>:

> I found the relevant information on the website. Ill consult with the
> cluster admin tomorrow, thanks for the help :-)
>
> - Pieter
>
> 2016-02-07 19:31 GMT+01:00 Robert Metzger <rmetz...@apache.org>:
>
>> Hi,
>>
>> we had other users with a similar issue as well. There is a configuration
>> value which allows you to specify a single port or a range of ports for the
>> JobManager to allocate when running on YARN.
>> Note that when using this with a single port, the JMs may collide.
>>
>>
>>
>> On Sun, Feb 7, 2016 at 7:25 PM, Pieter Hameete <phame...@gmail.com>
>> wrote:
>>
>>> Hi Stephan,
>>>
>>> surely it seems this way! I must not be the first with this issue
>>> though? I'll have to contact the cluster admins to find a solution
>>> together. What would be a way of make the JobManagers accessible from
>>> outside the network, because the IP and port number changes every time.
>>>
>>> Alternatively, I can ask for ssh access to a node within the network.
>>> that will surely work but it's not my preferred solution.
>>>
>>> - Pieter
>>>
>>> 2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>:
>>>
>>>> Yeah, sounds a lot like the client cannot connect to the JobManager
>>>> port.
>>>>
>>>> The ports to communicate with HDFS and the YARN resource manager may be
>>>> whitelisted r forwarded, so you can submit the YARN session, but then not
>>>> connect to the JobManager afterwards.
>>>>
>>>>
>>>>
>>>> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <phame...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Max!
>>>>>
>>>>> I'm using Flink 0.10.1 and indeed the cluster seems to be created
>>>>> fine, all in the JobManager Web UI looks good.
>>>>>
>>>>> It seems like the JobManager initiates the connection with my VM and
>>>>> cannot reach it. It could be that this is similar to the problem here:
>>>>>
>>>>>
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>>>>>
>>>>> I probably have to make some changes to the networking configuration
>>>>> of my VM so it can be reached by the JobManager despite using a different
>>>>> port each time.
>>>>>
>>>>> - Pieter
>>>>>
>>>>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <m...@apache.org>:
>>>>>
>>>>>> Hi Pieter,
>>>>>>
>>>>>> Which version of Flink are you using? It appears you've created a
>>>>>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>>>>>
>>>>>> Cheers,
>>>>>> Max
>>>>>>
>>>>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <phame...@gmail.com>
>>>>>> wrote:
>>>>>> > Hi Robert,
>>>>>> >
>>>>>> > unfortunately there are no signs of what is going wrong in the
>>>>>> logs. The
>>>>>> > last log messages are about succesful registration of the
>>>>>> TaskManagers.
>>>>>> >
>>>>>> > I'm also fairly sure it must be something in my VM that is causing
>>>>>> this,
>>>>>> > because when I start the yarn-session from a login node that is on
>>>>>> the same
>>>>>> > network as the hadoop cluster there are no problems registering
>>>>>> with the
>>>>>> > JobManager. I did also notice the following message in the local
>>>>>> console:
>>>>>> >
>>>>>> > 12:30:27,173 WARN  Remoting
>>>>>> > - Tried to associate with unreachable remote address
>>>>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for
>>>>>> 5000 ms,
>>>>>> > all messages to this address will be delivered to dead letters.
>>>>>> Reason:
>>>>>> > connection timed out: /145.100.41.13:41539
>>>>>> >
>>>>>> > I can ping the JobManager fine from with VM. Could there be some
>>>>>> invalid or
>>>>>> > missing configuration on my side?
>>>>>> >
>>>>>> > Cheers,
>>>>>> >
>>>>>> > Pieter
>>>>>> >
>>>>>> >
>>>>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rmetz...@apache.org>:
>>>>>> >>
>>>>>> >> Hi,
>>>>>> >>
>>>>>> >> did you check the logs of the JobManager itself? Maybe it'll tell
>>>>>> us
>>>>>> >> already whats going on.
>>>>>> >>
>>>>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <
>>>>>> phame...@gmail.com>
>>>>>> >> wrote:
>>>>>> >>>
>>>>>> >>> Hi Guys!
>>>>>> >>>
>>>>>> >>> Im attempting to run Flink on YARN, but I run into an issue. Im
>>>>>> starting
>>>>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well
>>>>>> until after
>>>>>> >>> the JobManager web UI is started:
>>>>>> >>>
>>>>>> >>> JobManager web interface address
>>>>>> >>>
>>>>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>>>>> >>> Waiting until all TaskManagers have connected
>>>>>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>> >>> - Notification about new leader address
>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>> session ID null.
>>>>>> >>> No status updates from the YARN cluster received so far. Waiting
>>>>>> ...
>>>>>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>> >>> - Received address of new leader
>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>> session ID null.
>>>>>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>> >>> - Disconnect from JobManager null.
>>>>>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>> >>> - Trying to register at JobManager
>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>>>>> >>> No status updates from the YARN cluster received so far. Waiting
>>>>>> ...
>>>>>> >>> No status updates from the YARN cluster received so far. Waiting
>>>>>> ...
>>>>>> >>>
>>>>>> >>> It then hangs on these last steps (trying to register, no status
>>>>>> >>> updates..)
>>>>>> >>>
>>>>>> >>> Im sure there must be a problem on my side that is causing me not
>>>>>> to be
>>>>>> >>> able to register at the JobManager. What could cause such
>>>>>> connection
>>>>>> >>> problems?
>>>>>> >>>
>>>>>> >>> Any tips are very welcome :-)
>>>>>> >>>
>>>>>> >>> Cheers and have a good weekend!
>>>>>> >>>
>>>>>> >>> - Pieter
>>>>>> >>>
>>>>>> >>>
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Reply via email to