Dear all, i got some trouble during the start of Flink in a Yarn-Container based on Cloudera. I have a start script like that:
slaxxxx:/applvg/home/flink/mvp $ cat run.sh export FLINK_HOME_DIR=/applvg/home/flink/mvp/flink-1.2.0/ export FLINK_JAR_DIR=/applvg/home/flink/mvp/cache export YARN_CONF_DIR=/etc/hadoop/conf export HADOOP_CONF_DIR=/etc/hadoop/conf /applvg/home/flink/mvp/flink-1.2.0/bin/yarn-session.sh -n 4 -s 3 -st -jm 2048 -tm 2048 -qu root.mr-spark.avp -d If I execute this script it looks like following: sla09037:/applvg/home/flink/mvp $ ./run.sh 2017-05-11 15:13:24,541 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2017-05-11 15:13:24,542 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-05-11 15:13:24,542 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 256 2017-05-11 15:13:24,543 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 512 2017-05-11 15:13:24,543 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2017-05-11 15:13:24,543 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-05-11 15:13:24,543 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-05-11 15:13:24,543 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-05-11 15:13:24,571 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2017-05-11 15:13:24,572 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-05-11 15:13:24,572 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 256 2017-05-11 15:13:24,572 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 512 2017-05-11 15:13:24,572 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2017-05-11 15:13:24,572 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-05-11 15:13:24,572 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-05-11 15:13:24,572 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-05-11 15:13:25,000 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to fl...@companyde.rootdom.net (auth:KERBEROS) 2017-05-11 15:13:25,030 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2017-05-11 15:13:25,030 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-05-11 15:13:25,030 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 256 2017-05-11 15:13:25,030 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 512 2017-05-11 15:13:25,031 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2017-05-11 15:13:25,031 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-05-11 15:13:25,031 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-05-11 15:13:25,031 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-05-11 15:13:25,050 INFO org.apache.flink.yarn.YarnClusterDescriptor - Using values: 2017-05-11 15:13:25,051 INFO org.apache.flink.yarn.YarnClusterDescriptor - TaskManager count = 4 2017-05-11 15:13:25,051 INFO org.apache.flink.yarn.YarnClusterDescriptor - JobManager memory = 2048 2017-05-11 15:13:25,051 INFO org.apache.flink.yarn.YarnClusterDescriptor - TaskManager memory = 2048 2017-05-11 15:13:25,903 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2017-05-11 15:13:25,962 WARN org.apache.flink.yarn.YarnClusterDescriptor - The configuration directory ('/applvg/home/flink/mvp/flink-1.2.0/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them. 2017-05-11 15:13:25,972 INFO org.apache.flink.yarn.Utils - Copying from file:/applvg/home/flink/mvp/flink-1.2.0/lib to hdfs://nameservice1/user/flink/.flink/application_1493762518335_0216/lib 2017-05-11 15:13:27,522 INFO org.apache.flink.yarn.Utils - Copying from file:/applvg/home/flink/mvp/flink-1.2.0/conf/log4j.properties to hdfs://nameservice1/user/flink/.flink/application_1493762518335_0216/log4j.properties 2017-05-11 15:13:27,552 INFO org.apache.flink.yarn.Utils - Copying from file:/applvg/home/flink/mvp/flink-1.2.0/conf/logback.xml to hdfs://nameservice1/user/flink/.flink/application_1493762518335_0216/logback.xml 2017-05-11 15:13:27,584 INFO org.apache.flink.yarn.Utils - Copying from file:/applvg/home/flink/mvp/flink-1.2.0/lib/flink-dist_2.11-1.2.0.jar to hdfs://nameservice1/user/flink/.flink/application_1493762518335_0216/flink-dist_2.11-1.2.0.jar 2017-05-11 15:13:28,508 INFO org.apache.flink.yarn.Utils - Copying from /applvg/home/flink/mvp/flink-1.2.0/conf/flink-conf.yaml to hdfs://nameservice1/user/flink/.flink/application_1493762518335_0216/flink-conf.yaml 2017-05-11 15:13:28,553 INFO org.apache.flink.yarn.YarnClusterDescriptor - Adding delegation token to the AM container.. 2017-05-11 15:13:28,563 INFO org.apache.hadoop.hdfs.DFSClient - Created HDFS_DELEGATION_TOKEN token 27247 for flink on ha-hdfs:nameservice1 Error while deploying YARN cluster: Couldn't deploy Yarn cluster java.lang.RuntimeException: Couldn't deploy Yarn cluster at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:421) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:620) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:476) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:473) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:473) Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: lfrar256.srv.company;lfrar257.srv.company at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.getDelegationTokenService(KMSClientProvider.java:823) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:779) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2046) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) at org.apache.flink.yarn.Utils.setTokensFor(Utils.java:154) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:753) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:419) ... 9 more Caused by: java.net.UnknownHostException: lfrarXXX1.srv.company;lfrarXXX2.srv.company ... 20 more It seems that flink found these hosts here: slaxxxxx:/applvg/home/flink/mvp $ grep -r "lfrarXXX1.srv.company;lfrarXXX2.srv.company" /etc/hadoop/conf /etc/hadoop/conf/core-site.xml: <value>kms://ht...@lfrarxxx1.srv.company;lfrarXXX2.srv.company:16000/kms</value> /etc/hadoop/conf/hdfs-site.xml: <value>kms://ht...@lfrarxxx1.srv.company;lfrarXXX2.srv.company:16000/kms</value> So I guess that flink got this connectionstrings from the Cloudera-Config and "forget" to split it at the ";". So if i ping each of those everything is working. Maybe you have some hints to avoid this problem? Best wishes Dominiuqe