Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Pieter Hameete Sun, 07 Feb 2016 11:05:27 -0800

I found the relevant information on the website. Ill consult with the
cluster admin tomorrow, thanks for the help :-)


- Pieter

2016-02-07 19:31 GMT+01:00 Robert Metzger <rmetz...@apache.org>:

> Hi,
>
> we had other users with a similar issue as well. There is a configuration
> value which allows you to specify a single port or a range of ports for the
> JobManager to allocate when running on YARN.
> Note that when using this with a single port, the JMs may collide.
>
>
>
> On Sun, Feb 7, 2016 at 7:25 PM, Pieter Hameete <phame...@gmail.com> wrote:
>
>> Hi Stephan,
>>
>> surely it seems this way! I must not be the first with this issue though?
>> I'll have to contact the cluster admins to find a solution together. What
>> would be a way of make the JobManagers accessible from outside the network,
>> because the IP and port number changes every time.
>>
>> Alternatively, I can ask for ssh access to a node within the network.
>> that will surely work but it's not my preferred solution.
>>
>> - Pieter
>>
>> 2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>:
>>
>>> Yeah, sounds a lot like the client cannot connect to the JobManager port.
>>>
>>> The ports to communicate with HDFS and the YARN resource manager may be
>>> whitelisted r forwarded, so you can submit the YARN session, but then not
>>> connect to the JobManager afterwards.
>>>
>>>
>>>
>>> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <phame...@gmail.com>
>>> wrote:
>>>
>>>> Hi Max!
>>>>
>>>> I'm using Flink 0.10.1 and indeed the cluster seems to be created fine,
>>>> all in the JobManager Web UI looks good.
>>>>
>>>> It seems like the JobManager initiates the connection with my VM and
>>>> cannot reach it. It could be that this is similar to the problem here:
>>>>
>>>>
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>>>>
>>>> I probably have to make some changes to the networking configuration of
>>>> my VM so it can be reached by the JobManager despite using a different port
>>>> each time.
>>>>
>>>> - Pieter
>>>>
>>>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <m...@apache.org>:
>>>>
>>>>> Hi Pieter,
>>>>>
>>>>> Which version of Flink are you using? It appears you've created a
>>>>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>>>>
>>>>> Cheers,
>>>>> Max
>>>>>
>>>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <phame...@gmail.com>
>>>>> wrote:
>>>>> > Hi Robert,
>>>>> >
>>>>> > unfortunately there are no signs of what is going wrong in the logs.
>>>>> The
>>>>> > last log messages are about succesful registration of the
>>>>> TaskManagers.
>>>>> >
>>>>> > I'm also fairly sure it must be something in my VM that is causing
>>>>> this,
>>>>> > because when I start the yarn-session from a login node that is on
>>>>> the same
>>>>> > network as the hadoop cluster there are no problems registering with
>>>>> the
>>>>> > JobManager. I did also notice the following message in the local
>>>>> console:
>>>>> >
>>>>> > 12:30:27,173 WARN  Remoting
>>>>> > - Tried to associate with unreachable remote address
>>>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for
>>>>> 5000 ms,
>>>>> > all messages to this address will be delivered to dead letters.
>>>>> Reason:
>>>>> > connection timed out: /145.100.41.13:41539
>>>>> >
>>>>> > I can ping the JobManager fine from with VM. Could there be some
>>>>> invalid or
>>>>> > missing configuration on my side?
>>>>> >
>>>>> > Cheers,
>>>>> >
>>>>> > Pieter
>>>>> >
>>>>> >
>>>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rmetz...@apache.org>:
>>>>> >>
>>>>> >> Hi,
>>>>> >>
>>>>> >> did you check the logs of the JobManager itself? Maybe it'll tell us
>>>>> >> already whats going on.
>>>>> >>
>>>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <phame...@gmail.com
>>>>> >
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> Hi Guys!
>>>>> >>>
>>>>> >>> Im attempting to run Flink on YARN, but I run into an issue. Im
>>>>> starting
>>>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well
>>>>> until after
>>>>> >>> the JobManager web UI is started:
>>>>> >>>
>>>>> >>> JobManager web interface address
>>>>> >>>
>>>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>>>> >>> Waiting until all TaskManagers have connected
>>>>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>>>> >>> - Notification about new leader address
>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>> session ID null.
>>>>> >>> No status updates from the YARN cluster received so far. Waiting
>>>>> ...
>>>>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>>>> >>> - Received address of new leader
>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>> session ID null.
>>>>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>>>> >>> - Disconnect from JobManager null.
>>>>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>>>> >>> - Trying to register at JobManager
>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>>>> >>> No status updates from the YARN cluster received so far. Waiting
>>>>> ...
>>>>> >>> No status updates from the YARN cluster received so far. Waiting
>>>>> ...
>>>>> >>>
>>>>> >>> It then hangs on these last steps (trying to register, no status
>>>>> >>> updates..)
>>>>> >>>
>>>>> >>> Im sure there must be a problem on my side that is causing me not
>>>>> to be
>>>>> >>> able to register at the JobManager. What could cause such
>>>>> connection
>>>>> >>> problems?
>>>>> >>>
>>>>> >>> Any tips are very welcome :-)
>>>>> >>>
>>>>> >>> Cheers and have a good weekend!
>>>>> >>>
>>>>> >>> - Pieter
>>>>> >>>
>>>>> >>>
>>>>> >>
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Reply via email to