Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Pieter Hameete Mon, 08 Feb 2016 08:08:48 -0800

Matter of RTFM eh ;-) thx and sorry for the bother.

2016-02-08 17:06 GMT+01:00 Robert Metzger <rmetz...@apache.org>:


> You said earlier that you are using Flink 0.10. The feature is only
> available in 1.0-SNAPSHOT.
>
> On Mon, Feb 8, 2016 at 4:53 PM, Pieter Hameete <phame...@gmail.com> wrote:
>
>> Ive tried setting the yarn.application-master.port property in
>> flink-conf.yaml to a range suggested in
>> https://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html#running-flink-on-yarn-behind-fi
>> rewalls
>>
>> The JobManager does not seem to be picking the property up. Am I setting
>> this in the wrong place? Or is there another way to enforce this property?
>>
>> Cheers,
>>
>> Pieter
>>
>> 2016-02-07 20:04 GMT+01:00 Pieter Hameete <phame...@gmail.com>:
>>
>>> I found the relevant information on the website. Ill consult with the
>>> cluster admin tomorrow, thanks for the help :-)
>>>
>>> - Pieter
>>>
>>> 2016-02-07 19:31 GMT+01:00 Robert Metzger <rmetz...@apache.org>:
>>>
>>>> Hi,
>>>>
>>>> we had other users with a similar issue as well. There is a
>>>> configuration value which allows you to specify a single port or a range of
>>>> ports for the JobManager to allocate when running on YARN.
>>>> Note that when using this with a single port, the JMs may collide.
>>>>
>>>>
>>>>
>>>> On Sun, Feb 7, 2016 at 7:25 PM, Pieter Hameete <phame...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Stephan,
>>>>>
>>>>> surely it seems this way! I must not be the first with this issue
>>>>> though? I'll have to contact the cluster admins to find a solution
>>>>> together. What would be a way of make the JobManagers accessible from
>>>>> outside the network, because the IP and port number changes every time.
>>>>>
>>>>> Alternatively, I can ask for ssh access to a node within the network.
>>>>> that will surely work but it's not my preferred solution.
>>>>>
>>>>> - Pieter
>>>>>
>>>>> 2016-02-06 16:22 GMT+01:00 Stephan Ewen <se...@apache.org>:
>>>>>
>>>>>> Yeah, sounds a lot like the client cannot connect to the JobManager
>>>>>> port.
>>>>>>
>>>>>> The ports to communicate with HDFS and the YARN resource manager may
>>>>>> be whitelisted r forwarded, so you can submit the YARN session, but then
>>>>>> not connect to the JobManager afterwards.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Feb 6, 2016 at 2:11 PM, Pieter Hameete <phame...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Max!
>>>>>>>
>>>>>>> I'm using Flink 0.10.1 and indeed the cluster seems to be created
>>>>>>> fine, all in the JobManager Web UI looks good.
>>>>>>>
>>>>>>> It seems like the JobManager initiates the connection with my VM and
>>>>>>> cannot reach it. It could be that this is similar to the problem here:
>>>>>>>
>>>>>>>
>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-docker-errors-with-akka-NAT-td7702.html
>>>>>>>
>>>>>>> I probably have to make some changes to the networking configuration
>>>>>>> of my VM so it can be reached by the JobManager despite using a 
>>>>>>> different
>>>>>>> port each time.
>>>>>>>
>>>>>>> - Pieter
>>>>>>>
>>>>>>> 2016-02-06 14:05 GMT+01:00 Maximilian Michels <m...@apache.org>:
>>>>>>>
>>>>>>>> Hi Pieter,
>>>>>>>>
>>>>>>>> Which version of Flink are you using? It appears you've created a
>>>>>>>> Flink YARN cluster but you can't reach the JobManager afterwards.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Max
>>>>>>>>
>>>>>>>> On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete <phame...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > Hi Robert,
>>>>>>>> >
>>>>>>>> > unfortunately there are no signs of what is going wrong in the
>>>>>>>> logs. The
>>>>>>>> > last log messages are about succesful registration of the
>>>>>>>> TaskManagers.
>>>>>>>> >
>>>>>>>> > I'm also fairly sure it must be something in my VM that is
>>>>>>>> causing this,
>>>>>>>> > because when I start the yarn-session from a login node that is
>>>>>>>> on the same
>>>>>>>> > network as the hadoop cluster there are no problems registering
>>>>>>>> with the
>>>>>>>> > JobManager. I did also notice the following message in the local
>>>>>>>> console:
>>>>>>>> >
>>>>>>>> > 12:30:27,173 WARN  Remoting
>>>>>>>> > - Tried to associate with unreachable remote address
>>>>>>>> > [akka.tcp://flink@145.100.41.13:41539]. Address is now gated for
>>>>>>>> 5000 ms,
>>>>>>>> > all messages to this address will be delivered to dead letters.
>>>>>>>> Reason:
>>>>>>>> > connection timed out: /145.100.41.13:41539
>>>>>>>> >
>>>>>>>> > I can ping the JobManager fine from with VM. Could there be some
>>>>>>>> invalid or
>>>>>>>> > missing configuration on my side?
>>>>>>>> >
>>>>>>>> > Cheers,
>>>>>>>> >
>>>>>>>> > Pieter
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2016-02-06 12:54 GMT+01:00 Robert Metzger <rmetz...@apache.org>:
>>>>>>>> >>
>>>>>>>> >> Hi,
>>>>>>>> >>
>>>>>>>> >> did you check the logs of the JobManager itself? Maybe it'll
>>>>>>>> tell us
>>>>>>>> >> already whats going on.
>>>>>>>> >>
>>>>>>>> >> On Sat, Feb 6, 2016 at 12:14 PM, Pieter Hameete <
>>>>>>>> phame...@gmail.com>
>>>>>>>> >> wrote:
>>>>>>>> >>>
>>>>>>>> >>> Hi Guys!
>>>>>>>> >>>
>>>>>>>> >>> Im attempting to run Flink on YARN, but I run into an issue. Im
>>>>>>>> starting
>>>>>>>> >>> the Flink YARN session from an Ubuntu 14.04 VM. All goes well
>>>>>>>> until after
>>>>>>>> >>> the JobManager web UI is started:
>>>>>>>> >>>
>>>>>>>> >>> JobManager web interface address
>>>>>>>> >>>
>>>>>>>> http://head05.hathi.surfsara.nl:8088/proxy/application_1452780322684_10532/
>>>>>>>> >>> Waiting until all TaskManagers have connected
>>>>>>>> >>> 11:09:51,557 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>> >>> - Notification about new leader address
>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>>>> session ID null.
>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>> Waiting ...
>>>>>>>> >>> 11:09:51,578 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>> >>> - Received address of new leader
>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager with
>>>>>>>> session ID null.
>>>>>>>> >>> 11:09:51,583 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>> >>> - Disconnect from JobManager null.
>>>>>>>> >>> 11:09:51,595 INFO  org.apache.flink.yarn.ApplicationClient
>>>>>>>> >>> - Trying to register at JobManager
>>>>>>>> >>> akka.tcp://flink@145.100.41.148:35666/user/jobmanager.
>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>> Waiting ...
>>>>>>>> >>> No status updates from the YARN cluster received so far.
>>>>>>>> Waiting ...
>>>>>>>> >>>
>>>>>>>> >>> It then hangs on these last steps (trying to register, no status
>>>>>>>> >>> updates..)
>>>>>>>> >>>
>>>>>>>> >>> Im sure there must be a problem on my side that is causing me
>>>>>>>> not to be
>>>>>>>> >>> able to register at the JobManager. What could cause such
>>>>>>>> connection
>>>>>>>> >>> problems?
>>>>>>>> >>>
>>>>>>>> >>> Any tips are very welcome :-)
>>>>>>>> >>>
>>>>>>>> >>> Cheers and have a good weekend!
>>>>>>>> >>>
>>>>>>>> >>> - Pieter
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

Reply via email to