Re: flink 1.7.2 YARN Session模式提交任务问题求助
注意环境变量和 fs.hdfs.hdfsdefault 要配置成 HDFS 路径或 YARN 集群已知的本地路径,不要配置成客户端的路径。因为实际起作用是在拉起 TM 的那台机器上解析拉取的。 Best, tison. Chief 于2020年4月15日周三 下午7:40写道: > hi Yangze Guo > 您说的环境变量已经在当前用户的环境变量文件里面设置了,您可以看看我的问题描述,现在如果checkpoint的路径设置不是namenode > ha的nameservice就不会报错,checkpoint都正常。 > > > > > --原始邮件-- > 发件人:"Yangze Guo" 发送时间:2020年4月15日(星期三) 下午3:00 > 收件人:"user-zh" > 主题:Re: flink 1.7.2 YARN Session模式提交任务问题求助 > > > > Flink需要设置hadoop相关conf位置的环境变量 YARN_CONF_DIR or HADOOP_CONF_DIR [1] > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/yarn_setup.html > > Best, > Yangze Guo > > On Mon, Apr 13, 2020 at 10:52 PM Chief > 大家好 > 目前环境是flink 1.7.2,使用YARN Session模式提交任务,Hadoop 版本2.7.3,hdfs > namenode配置了ha模式,提交任务的时候报以下错误,系统环境变量中已经设置了HADOOP_HOME,YARN_CONF_DIR,HADOOP_CONF_DIR,HADOOP_CLASSPATH,在flink_conf.yaml中配置了fs.hdfs.hadoopconf > > > 2020-04-10 19:12:02,908 INFOnbsp; > org.apache.flink.runtime.jobmaster.JobMasternbsp; nbsp; > nbsp; nbsp; nbsp; nbsp; nbsp; nbsp; > nbsp; - Connecting to ResourceManager akka.tcp://flink@trusfortpoc1 > :23584/user/resourcemanager() > 2020-04-10 19:12:02,909 INFOnbsp; > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolnbsp; nbsp; > nbsp; nbsp; nbsp; - Cannot serve slot request, no > ResourceManager connected. Adding as pending request > [SlotRequestId{0feacbb4fe16c8c7a70249f1396565d0}] > 2020-04-10 19:12:02,911 INFOnbsp; > org.apache.flink.runtime.jobmaster.JobMasternbsp; nbsp; > nbsp; nbsp; nbsp; nbsp; nbsp; nbsp; > nbsp; - Resolved ResourceManager address, beginning registration > 2020-04-10 19:12:02,911 INFOnbsp; > org.apache.flink.runtime.jobmaster.JobMasternbsp; nbsp; > nbsp; nbsp; nbsp; nbsp; nbsp; nbsp; > nbsp; - Registration at ResourceManager attempt 1 (timeout=100ms) > 2020-04-10 19:12:02,912 INFOnbsp; > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolnbsp; nbsp; > nbsp; nbsp; nbsp; - Cannot serve slot request, no > ResourceManager connected. Adding as pending request > [SlotRequestId{35ad2384e9cd0efd30b43f5302db24b6}] > 2020-04-10 19:12:02,913 INFOnbsp; > org.apache.flink.yarn.YarnResourceManagernbsp; nbsp; nbsp; > nbsp; nbsp; nbsp; nbsp; nbsp; nbsp; > nbsp; nbsp;- Registering job manager > 0...@akka.tcp://flink@trusfortpoc1:23584/user/jobmanager_0 > for job 24691b33c18d7ad73b1f52edb3d68ae4. > 2020-04-10 19:12:02,917 INFOnbsp; > org.apache.flink.yarn.YarnResourceManagernbsp; nbsp; nbsp; > nbsp; nbsp; nbsp; nbsp; nbsp; nbsp; > nbsp; nbsp;- Registered job manager > 0...@akka.tcp://flink@trusfortpoc1:23584/user/jobmanager_0 > for job 24691b33c18d7ad73b1f52edb3d68ae4. > 2020-04-10 19:12:02,919 INFOnbsp; > org.apache.flink.runtime.jobmaster.JobMasternbsp; nbsp; > nbsp; nbsp; nbsp; nbsp; nbsp; nbsp; > nbsp; - JobManager successfully registered at ResourceManager, leader > id: . > 2020-04-10 19:12:02,919 INFOnbsp; > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolnbsp; nbsp; > nbsp; nbsp; nbsp; - Requesting new slot > [SlotRequestId{35ad2384e9cd0efd30b43f5302db24b6}] and profile > ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, > nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager. > 2020-04-10 19:12:02,920 INFOnbsp; > org.apache.flink.yarn.YarnResourceManagernbsp; nbsp; nbsp; > nbsp; nbsp; nbsp; nbsp; nbsp; nbsp; > nbsp; nbsp;- Request slot with profile > ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, > nativeMemoryInMB=0, networkMemoryInMB=0} for job > 24691b33c18d7ad73b1f52edb3d68ae4 with allocation id > AllocationID{5a12237c7f2bd8b1cc760ddcbab5a1c0}. > 2020-04-10 19:12:02,921 INFOnbsp; > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolnbsp; nbsp; > nbsp; nbsp; nbsp; - Requesting new slot > [SlotRequestId{0feacbb4fe16c8c7a70249f1396565d0}] and profile > ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, > nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager. > 2020-04-10 19:12:02,924 INFOnbsp; > org.apache.flink.yarn.YarnResourceManagernbsp; nbsp; nbsp; > nbsp; nbsp; nbsp; nbsp; nbsp; nbsp; > nbsp; nbsp;- Requesting new TaskExecutor container with resources > 2020-04-10 19:12:02,926 INFOnbsp; > org.apache.flink.yarn.YarnResourceManagernbsp; nbsp; nbsp; > nbsp; nbsp; nbsp; nbsp; nbsp; nbsp; > nbsp; nbsp;- Request slot with profile > ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, > nativeMemoryInMB=0, networkMemoryInMB=0} for job > 24691b33c18d7ad73b1f52edb3d68ae4 with allocation id > AllocationID{37dd666a18040bf63ffbf2e022b2ea9b}. > 2020-04-10 19:12:0
?????? flink 1.7.2 YARN Session????????????????????
hi Yangze Guo ??checkpoint??namenode ha??nameservicecheckpoint ---- ??:"Yangze Guo"https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/yarn_setup.html Best, Yangze Guo On Mon, Apr 13, 2020 at 10:52 PM Chief
Re: flink 1.7.2 YARN Session模式提交任务问题求助
Flink需要设置hadoop相关conf位置的环境变量 YARN_CONF_DIR or HADOOP_CONF_DIR [1] [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/yarn_setup.html Best, Yangze Guo On Mon, Apr 13, 2020 at 10:52 PM Chief wrote: > > 大家好 > 目前环境是flink 1.7.2,使用YARN Session模式提交任务,Hadoop 版本2.7.3,hdfs > namenode配置了ha模式,提交任务的时候报以下错误,系统环境变量中已经设置了HADOOP_HOME,YARN_CONF_DIR,HADOOP_CONF_DIR,HADOOP_CLASSPATH,在flink_conf.yaml中配置了fs.hdfs.hadoopconf > > > 2020-04-10 19:12:02,908 INFO > org.apache.flink.runtime.jobmaster.JobMaster > - Connecting to ResourceManager > akka.tcp://flink@trusfortpoc1:23584/user/resourcemanager() > 2020-04-10 19:12:02,909 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool > - Cannot serve slot request, no ResourceManager connected. > Adding as pending request [SlotRequestId{0feacbb4fe16c8c7a70249f1396565d0}] > 2020-04-10 19:12:02,911 INFO > org.apache.flink.runtime.jobmaster.JobMaster > - Resolved ResourceManager address, > beginning registration > 2020-04-10 19:12:02,911 INFO > org.apache.flink.runtime.jobmaster.JobMaster > - Registration at ResourceManager attempt > 1 (timeout=100ms) > 2020-04-10 19:12:02,912 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool > - Cannot serve slot request, no ResourceManager connected. > Adding as pending request [SlotRequestId{35ad2384e9cd0efd30b43f5302db24b6}] > 2020-04-10 19:12:02,913 INFO > org.apache.flink.yarn.YarnResourceManager > - Registering job manager > 0...@akka.tcp://flink@trusfortpoc1:23584/user/jobmanager_0 > for job 24691b33c18d7ad73b1f52edb3d68ae4. > 2020-04-10 19:12:02,917 INFO > org.apache.flink.yarn.YarnResourceManager > - Registered job manager > 0...@akka.tcp://flink@trusfortpoc1:23584/user/jobmanager_0 > for job 24691b33c18d7ad73b1f52edb3d68ae4. > 2020-04-10 19:12:02,919 INFO > org.apache.flink.runtime.jobmaster.JobMaster > - JobManager successfully registered at > ResourceManager, leader id: . > 2020-04-10 19:12:02,919 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool > - Requesting new slot > [SlotRequestId{35ad2384e9cd0efd30b43f5302db24b6}] and profile > ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, > nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager. > 2020-04-10 19:12:02,920 INFO > org.apache.flink.yarn.YarnResourceManager > - Request slot with profile > ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, > nativeMemoryInMB=0, networkMemoryInMB=0} for job > 24691b33c18d7ad73b1f52edb3d68ae4 with allocation id > AllocationID{5a12237c7f2bd8b1cc760ddcbab5a1c0}. > 2020-04-10 19:12:02,921 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPool > - Requesting new slot > [SlotRequestId{0feacbb4fe16c8c7a70249f1396565d0}] and profile > ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, > nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager. > 2020-04-10 19:12:02,924 INFO > org.apache.flink.yarn.YarnResourceManager > - Requesting new TaskExecutor > container with resources 1. > 2020-04-10 19:12:02,926 INFO > org.apache.flink.yarn.YarnResourceManager > - Request slot with profile > ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, > nativeMemoryInMB=0, networkMemoryInMB=0} for job > 24691b33c18d7ad73b1f52edb3d68ae4 with allocation id > AllocationID{37dd666a18040bf63ffbf2e022b2ea9b}. > 2020-04-10 19:12:06,531 INFO > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl > - Received new token for : trusfortpoc3:35206 > 2020-04-10 19:12:06,543 INFO > org.apache.flink.yarn.YarnResourceManager > - Received new container: > container_1586426824930_0006_01_02 - Remaining pending container > requests: 1 > 2020-04-10 19:12:06,543 INFO > org.apache.flink.yarn.YarnResourceManager > - Removing container request > Capability[ 0. > 2020-04-10 19:12:06,568 ERROR org.apache.flink.yarn.YarnResourceManager > - Could > not start TaskManager in container container_1586426824930_0006_01_02. > java.lang.IllegalArgumentException: java.net.UnknownHostException: > hdfsClusterForML > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:320) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) > at org.apache.hadoop.hdfs.DFSClient. at org.apache.hadoop.hdfs.DFSClient. at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93) > at >
flink 1.7.2 YARN Session????????????????????
?? ??flink 1.7.2??YARN Session??Hadoop 2.7.3??hdfs namenode??haHADOOP_HOME,YARN_CONF_DIR??HADOOP_CONF_DIR??HADOOP_CLASSPATHflink_conf.yamlfs.hdfs.hadoopconf 2020-04-10 19:12:02,908 INFO org.apache.flink.runtime.jobmaster.JobMaster - Connecting to ResourceManager akka.tcp://flink@trusfortpoc1:23584/user/resourcemanager() 2020-04-10 19:12:02,909 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{0feacbb4fe16c8c7a70249f1396565d0}] 2020-04-10 19:12:02,911 INFO org.apache.flink.runtime.jobmaster.JobMaster - Resolved ResourceManager address, beginning registration 2020-04-10 19:12:02,911 INFO org.apache.flink.runtime.jobmaster.JobMaster - Registration at ResourceManager attempt 1 (timeout=100ms) 2020-04-10 19:12:02,912 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{35ad2384e9cd0efd30b43f5302db24b6}] 2020-04-10 19:12:02,913 INFO org.apache.flink.yarn.YarnResourceManager - Registering job manager 0...@akka.tcp://flink@trusfortpoc1:23584/user/jobmanager_0 for job 24691b33c18d7ad73b1f52edb3d68ae4. 2020-04-10 19:12:02,917 INFO org.apache.flink.yarn.YarnResourceManager - Registered job manager 0...@akka.tcp://flink@trusfortpoc1:23584/user/jobmanager_0 for job 24691b33c18d7ad73b1f52edb3d68ae4. 2020-04-10 19:12:02,919 INFO org.apache.flink.runtime.jobmaster.JobMaster - JobManager successfully registered at ResourceManager, leader id: . 2020-04-10 19:12:02,919 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Requesting new slot [SlotRequestId{35ad2384e9cd0efd30b43f5302db24b6}] and profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager. 2020-04-10 19:12:02,920 INFO org.apache.flink.yarn.YarnResourceManager - Request slot with profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} for job 24691b33c18d7ad73b1f52edb3d68ae4 with allocation id AllocationID{5a12237c7f2bd8b1cc760ddcbab5a1c0}. 2020-04-10 19:12:02,921 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Requesting new slot [SlotRequestId{0feacbb4fe16c8c7a70249f1396565d0}] and profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager. 2020-04-10 19:12:02,924 INFO org.apache.flink.yarn.YarnResourceManager - Requesting new TaskExecutor container with resources
flink 1.7.2 YARN Session????????????????????
?? ??flink 1.7.2??YARN Session??Hadoop 2.7.3??hdfs namenode??haHADOOP_HOME,YARN_CONF_DIR??HADOOP_CONF_DIR??HADOOP_CLASSPATHflink_conf.yamlfs.hdfs.hadoopconf 2020-04-10 19:12:02,908 INFO org.apache.flink.runtime.jobmaster.JobMaster - Connecting to ResourceManager akka.tcp://flink@trusfortpoc1:23584/user/resourcemanager() 2020-04-10 19:12:02,909 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{0feacbb4fe16c8c7a70249f1396565d0}] 2020-04-10 19:12:02,911 INFO org.apache.flink.runtime.jobmaster.JobMaster - Resolved ResourceManager address, beginning registration 2020-04-10 19:12:02,911 INFO org.apache.flink.runtime.jobmaster.JobMaster - Registration at ResourceManager attempt 1 (timeout=100ms) 2020-04-10 19:12:02,912 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{35ad2384e9cd0efd30b43f5302db24b6}] 2020-04-10 19:12:02,913 INFO org.apache.flink.yarn.YarnResourceManager - Registering job manager 0...@akka.tcp://flink@trusfortpoc1:23584/user/jobmanager_0 for job 24691b33c18d7ad73b1f52edb3d68ae4. 2020-04-10 19:12:02,917 INFO org.apache.flink.yarn.YarnResourceManager - Registered job manager 0...@akka.tcp://flink@trusfortpoc1:23584/user/jobmanager_0 for job 24691b33c18d7ad73b1f52edb3d68ae4. 2020-04-10 19:12:02,919 INFO org.apache.flink.runtime.jobmaster.JobMaster - JobManager successfully registered at ResourceManager, leader id: . 2020-04-10 19:12:02,919 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Requesting new slot [SlotRequestId{35ad2384e9cd0efd30b43f5302db24b6}] and profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager. 2020-04-10 19:12:02,920 INFO org.apache.flink.yarn.YarnResourceManager - Request slot with profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} for job 24691b33c18d7ad73b1f52edb3d68ae4 with allocation id AllocationID{5a12237c7f2bd8b1cc760ddcbab5a1c0}. 2020-04-10 19:12:02,921 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Requesting new slot [SlotRequestId{0feacbb4fe16c8c7a70249f1396565d0}] and profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager. 2020-04-10 19:12:02,924 INFO org.apache.flink.yarn.YarnResourceManager - Requesting new TaskExecutor container with resources