Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Robert Metzger Sun, 07 Feb 2016 10:32:29 -0800

Hi,

we had other users with a similar issue as well. There is a configuration
value which allows you to specify a single port or a range of ports for the
JobManager to allocate when running on YARN.
Note that when using this with a single port, the JMs may collide.




On Sun, Feb 7, 2016 at 7:25 PM, Pieter Hameete <phame...@gmail.com> wrote:

> Hi Stephan,
>
> surely it seems this way! I must not be the first with this issue though?
> I'll have to contact the cluster admins to find a solution together. What
> would be a way of make the JobManagers accessible from outside the network,
> because the IP and port number changes every time.
>
> Alternatively, I can ask for ssh access to a node within the network. that
> will surely work but it's not my preferred solution.
>
> - Pieter
>
> 2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>:
>
>> Yeah, sounds a lot like the client cannot connect to the JobManager port.
>>
>> The ports to communicate with HDFS and the YARN resource manager may be
>> whitelisted r forwarded, so you can submit the YARN session, but then not
>> connect to the JobManager afterwards.
>>
>>
>>
>> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <phame...@gmail.com>
>> wrote:
>>
>>> Hi Max!
>>>
>>> I'm using Flink 0.10.1 and indeed the cluster seems to be created fine,
>>> all in the JobManager Web UI looks good.
>>>
>>> It seems like the JobManager initiates the connection with my VM and
>>> cannot reach it. It could be that this is similar to the problem here:
>>>
>>>
>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>>>
>>> I probably have to make some changes to the networking configuration of
>>> my VM so it can be reached by the JobManager despite using a different port
>>> each time.
>>>
>>> - Pieter
>>>
>>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <m...@apache.org>:
>>>
>>>> Hi Pieter,
>>>>
>>>> Which version of Flink are you using? It appears you've created a
>>>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>>>
>>>> Cheers,
>>>> Max
>>>>
>>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <phame...@gmail.com>
>>>> wrote:
>>>> > Hi Robert,
>>>> >
>>>> > unfortunately there are no signs of what is going wrong in the logs.
>>>> The
>>>> > last log messages are about succesful registration of the
>>>> TaskManagers.
>>>> >
>>>> > I'm also fairly sure it must be something in my VM that is causing
>>>> this,
>>>> > because when I start the yarn-session from a login node that is on
>>>> the same
>>>> > network as the hadoop cluster there are no problems registering with
>>>> the
>>>> > JobManager. I did also notice the following message in the local
>>>> console:
>>>> >
>>>> > 12:30:27,173 WARN  Remoting
>>>> > - Tried to associate with unreachable remote address
>>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for
>>>> 5000 ms,
>>>> > all messages to this address will be delivered to dead letters.
>>>> Reason:
>>>> > connection timed out: /145.100.41.13:41539
>>>> >
>>>> > I can ping the JobManager fine from with VM. Could there be some
>>>> invalid or
>>>> > missing configuration on my side?
>>>> >
>>>> > Cheers,
>>>> >
>>>> > Pieter
>>>> >
>>>> >
>>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rmetz...@apache.org>:
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >> did you check the logs of the JobManager itself? Maybe it'll tell us
>>>> >> already whats going on.
>>>> >>
>>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <phame...@gmail.com>
>>>> >> wrote:
>>>> >>>
>>>> >>> Hi Guys!
>>>> >>>
>>>> >>> Im attempting to run Flink on YARN, but I run into an issue. Im
>>>> starting
>>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well until
>>>> after
>>>> >>> the JobManager web UI is started:
>>>> >>>
>>>> >>> JobManager web interface address
>>>> >>>
>>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>>> >>> Waiting until all TaskManagers have connected
>>>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>>> >>> - Notification about new leader address
>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session
>>>> ID null.
>>>> >>> No status updates from the YARN cluster received so far. Waiting ...
>>>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>>> >>> - Received address of new leader
>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with session
>>>> ID null.
>>>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>>> >>> - Disconnect from JobManager null.
>>>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>>> >>> - Trying to register at JobManager
>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>>> >>> No status updates from the YARN cluster received so far. Waiting ...
>>>> >>> No status updates from the YARN cluster received so far. Waiting ...
>>>> >>>
>>>> >>> It then hangs on these last steps (trying to register, no status
>>>> >>> updates..)
>>>> >>>
>>>> >>> Im sure there must be a problem on my side that is causing me not
>>>> to be
>>>> >>> able to register at the JobManager. What could cause such connection
>>>> >>> problems?
>>>> >>>
>>>> >>> Any tips are very welcome :-)
>>>> >>>
>>>> >>> Cheers and have a good weekend!
>>>> >>>
>>>> >>> - Pieter
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Reply via email to