This looks like the reason:

java.net.UnknownHostException: Cannot resolve the JobManager hostname
'hostname-of-master' specified in the configuration

On Wed, Feb 3, 2016 at 7:29 PM, Ravinder Kaur <[email protected]> wrote:

> Hello,
>
> The log file of the Taskmanager now shows the following
>
> 18:27:10,082 WARN  org.apache.hadoop.util.NativeCodeLoader
>       - Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> 18:27:10,244 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -
> --------------------------------------------------------------------------------
> 18:27:10,244 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -  Starting TaskManager (Version: 0.10.1, Rev:2e9b231,
> Date:22.11.2015 @ 12:41:12 CET)
> 18:27:10,244 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -  Current user: flink
> 18:27:10,245 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.7/24.91-b01
> 18:27:10,245 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -  Maximum heap size: 491 MiBytes
> 18:27:10,245 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -  JAVA_HOME: /usr/lib/jvm/java-1.7.0-openjdk-amd64
> 18:27:10,247 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -  Hadoop version: 2.7.0
> 18:27:10,247 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -  JVM Options:
> 18:27:10,247 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -     -Xms512M
> 18:27:10,247 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -     -Xmx512M
> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -     -XX:MaxDirectMemorySize=8388607T
> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -     -XX:MaxPermSize=256m
> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -
> -Dlog.file=/home/flink/flink-0.10.1/log/flink-flink-taskmanager-0-vm-10-155-208-137.cloud.mwn.de.log
> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -
> -Dlog4j.configuration=file:/home/flink/flink-0.10.1/conf/log4j.properties
> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -
> -Dlogback.configurationFile=file:/home/flink/flink-0.10.1/conf/logback.xml
> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -  Program Arguments:
> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -     --configDir
> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -     /home/flink/flink-0.10.1/conf
> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -     --streamingMode
> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -     batch
> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -  Classpath:
> /home/flink/flink-0.10.1/lib/flink-dist_2.11-0.10.1.jar:/home/flink/flink-0.10.1/lib/flink-python_2.11-0.10.1.jar:/home/flink/flink-0.10.1/lib/log4j-1.2.17.jar:/home/flink/flink-0.10.1/lib/slf4j-log4j12-1.7.7.jar:/usr/lib/jvm/java-1.7.0-openjdk-amd64/lib/tools.jar::
> 18:27:10,248 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        -
> --------------------------------------------------------------------------------
> 18:27:10,252 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        - Maximum number of open file descriptors is 4096
> 18:27:10,277 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        - Loading configuration from /home/flink/flink-0.10.1/conf
> 18:27:10,356 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>        - Security is not enabled. Starting non-authenticated TaskManager.
> 18:27:10,365 ERROR org.apache.flink.runtime.taskmanager.TaskManager
>        - Failed to run TaskManager.
> java.net.UnknownHostException: Cannot resolve the JobManager hostname
> 'hostname-of-master' specified in the configuration
>         at
> org.apache.flink.runtime.util.StandaloneUtils.createLeaderRetrievalService(StandaloneUtils.java:79)
>         at
> org.apache.flink.runtime.util.StandaloneUtils.createLeaderRetrievalService(StandaloneUtils.java:48)
>         at
> org.apache.flink.runtime.util.LeaderRetrievalUtils.createLeaderRetrievalService(LeaderRetrievalUtils.java:69)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndPort(TaskManager.scala:1351)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1328)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager$.main(TaskManager.scala:1240)
>         at
> org.apache.flink.runtime.taskmanager.TaskManager.main(TaskManager.scala)
>
> Kind Regards,
> Ravinder Kaur
>
> On Wed, Feb 3, 2016 at 7:19 PM, Stephan Ewen <[email protected]> wrote:
>
>> What do the TaskManger logs say?
>>
>> On Wed, Feb 3, 2016 at 6:34 PM, Ravinder Kaur <[email protected]>
>> wrote:
>>
>>> Hello,
>>>
>>> Thanks for the quick reply. I tried to set jobmanager.rpc.address in
>>> flink-conf.yaml to the hostname of master node on both the nodes.
>>>
>>> Now it does not start the Taskmanager at the worker node at all. When I
>>> start the cluster using ./bin/start-cluster.sh on master it shows the
>>> normal output of starting the Jobmanager and Taskmanager but when I run jps
>>> on the nodes the slave does not have the Taskmanager running.
>>>
>>> Running the WordCount example again fails showing the same error.
>>> Stopping the cluster says no taskmanager to stop.
>>>
>>> Kind Regards,
>>> Ravinder Kaur
>>>
>>> On Wed, Feb 3, 2016 at 5:47 PM, Stephan Ewen <[email protected]> wrote:
>>>
>>>> Looks like the network configuration is not correct.
>>>>
>>>> I would try setting the full host name (like "master.abc.xyz.com") as
>>>> jobmanager.rpc.address.
>>>>
>>>> Greetings,
>>>> Stephan
>>>>
>>>>
>>>> On Wed, Feb 3, 2016 at 5:43 PM, Ravinder Kaur <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>> Hello Community,
>>>>>
>>>>> I'm a student and new to Apache Flink. I'm trying to learn and have
>>>>> setup a 2- node standalone Flink(0.10.1) cluster (one master and one
>>>>> worker). I'm facing the following issue.
>>>>>
>>>>> Cluster: consists of 2 vms (one master and one worker)
>>>>>
>>>>> The configurations are done as per
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/cluster_setup.html
>>>>>
>>>>> When I start the cluster both the JobManager and the TaskManager are
>>>>> started on the master and worker respectively.
>>>>>
>>>>> Command to start the cluster : bin/start-cluster.sh
>>>>>
>>>>> JPS shows all the processes running.
>>>>>
>>>>> Then I run the following command to run a WordCount example job: 
>>>>> ./bin/flink
>>>>> run ./examples/WordCount.jar
>>>>>
>>>>> the result is attached with the mail.
>>>>>
>>>>> The error is
>>>>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailabeException:
>>>>> Not enough free slots available to run to run the job
>>>>> ....................... Resources available to scheduler: Number of
>>>>> instances=0, total number of slots= 0, available slots=0
>>>>>
>>>>> Therefore I suppose that the JobManager does not find the TaskManager
>>>>> and checked the logs of the TaskManager which indeed shows that the
>>>>> TaskManager is unable to register at the JobManager for quite a long 
>>>>> time. There
>>>>> are org.apache.flink.runtime.net.ConnectionUtils: Failed to connect
>>>>> from localhost: Connect timed out and 
>>>>> org.apache.flink.runtime.net.ConnectionUtils:
>>>>> Failed to connect from address localhost: Network is Unreachable messages
>>>>> in the log of the TaskManager. Later when it starts up after a number of
>>>>> attempts and tries to register at the JobManager, which also fails after a
>>>>> lot of attempts showing the following message 
>>>>> org.apache.flink.runtime.taskmanager.Taskmanager:
>>>>> Trying to register at JobManager 
>>>>> akka.tcp://flink@master:6123/user'/jobmanager
>>>>> (attempt:92, timeout:30seconds) and 
>>>>> org.apache.flink.runtime.taskmanager.Taskmanager:
>>>>> Tried to associate with unreachable remote host 
>>>>> [akka.tcp://flink@master:6123/user/jobmanager].
>>>>> Address is now gated for 5000ms, all messages to this address will be
>>>>> delivered to dead letters. Reason: Connection timed out: /master:6123
>>>>>
>>>>> I browsed the internet for these and found
>>>>>  
>>>>> http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb
>>>>> <http://stackoverflow.com/questions/33601020/flink-job-wont-run-with-higher-taskmanager-heap-mb>
>>>>> and https://issues.apache.org/jira/browse/FLINK-1119 these links
>>>>> helpful. Stephan Ewen the guy who provided the solution in both the links
>>>>> gives a good explanation that the TaskManagers take quite some time to
>>>>> register at the JobManager and therefore I waited for as long as 20 mins
>>>>> after starting the cluster to run the job. But even after waiting so long 
>>>>> I
>>>>> get the same error.
>>>>>
>>>>> Another suggestion was to run the cluster in streaming mode. So I
>>>>> tried it with the command : bin/start-cluster-streaming.sh and ran
>>>>> the job but I get the same error. I have rechecked all the configurations
>>>>> but I'm unable to find out the fault.
>>>>>
>>>>> I re-checked all the configurations but could not find anything wrong.
>>>>> Also checked the port 6123 on master which is in LISTEN state and tcp
>>>>> request from worker to master shows SYN_SENT state using netstat -na and
>>>>> lsof -i commands.
>>>>>
>>>>> I opened the webpage on master http://localhost:8081 but it shows
>>>>> nothing and localhost:8080 says connection refused.
>>>>>
>>>>> Kindly help me out as it is very important for me. Let me know if you
>>>>> have any questions.
>>>>>
>>>>> Kind Regards,
>>>>> Ravinder Kaur
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to