Re: flink 1.7.2 YARN Session模式提交任务问题求助

2020-04-15 文章 tison
注意环境变量和 fs.hdfs.hdfsdefault 要配置成 HDFS 路径或 YARN
集群已知的本地路径,不要配置成客户端的路径。因为实际起作用是在拉起 TM 的那台机器上解析拉取的。

Best,
tison.


Chief  于2020年4月15日周三 下午7:40写道:

> hi Yangze Guo
> 您说的环境变量已经在当前用户的环境变量文件里面设置了,您可以看看我的问题描述,现在如果checkpoint的路径设置不是namenode
> ha的nameservice就不会报错,checkpoint都正常。
>
>
>
>
> --原始邮件--
> 发件人:"Yangze Guo" 发送时间:2020年4月15日(星期三) 下午3:00
> 收件人:"user-zh"
> 主题:Re: flink 1.7.2 YARN Session模式提交任务问题求助
>
>
>
> Flink需要设置hadoop相关conf位置的环境变量 YARN_CONF_DIR or HADOOP_CONF_DIR [1]
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/yarn_setup.html
>
> Best,
> Yangze Guo
>
> On Mon, Apr 13, 2020 at 10:52 PM Chief  
>  大家好
>  目前环境是flink 1.7.2,使用YARN Session模式提交任务,Hadoop 版本2.7.3,hdfs
> namenode配置了ha模式,提交任务的时候报以下错误,系统环境变量中已经设置了HADOOP_HOME,YARN_CONF_DIR,HADOOP_CONF_DIR,HADOOP_CLASSPATH,在flink_conf.yaml中配置了fs.hdfs.hadoopconf
> 
> 
>  2020-04-10 19:12:02,908 INFOnbsp;
> org.apache.flink.runtime.jobmaster.JobMasternbsp; nbsp;
> nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;
> nbsp; - Connecting to ResourceManager akka.tcp://flink@trusfortpoc1
> :23584/user/resourcemanager()
>  2020-04-10 19:12:02,909 INFOnbsp;
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolnbsp; nbsp;
> nbsp; nbsp; nbsp; - Cannot serve slot request, no
> ResourceManager connected. Adding as pending request
> [SlotRequestId{0feacbb4fe16c8c7a70249f1396565d0}]
>  2020-04-10 19:12:02,911 INFOnbsp;
> org.apache.flink.runtime.jobmaster.JobMasternbsp; nbsp;
> nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;
> nbsp; - Resolved ResourceManager address, beginning registration
>  2020-04-10 19:12:02,911 INFOnbsp;
> org.apache.flink.runtime.jobmaster.JobMasternbsp; nbsp;
> nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;
> nbsp; - Registration at ResourceManager attempt 1 (timeout=100ms)
>  2020-04-10 19:12:02,912 INFOnbsp;
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolnbsp; nbsp;
> nbsp; nbsp; nbsp; - Cannot serve slot request, no
> ResourceManager connected. Adding as pending request
> [SlotRequestId{35ad2384e9cd0efd30b43f5302db24b6}]
>  2020-04-10 19:12:02,913 INFOnbsp;
> org.apache.flink.yarn.YarnResourceManagernbsp; nbsp; nbsp;
> nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;
> nbsp; nbsp;- Registering job manager
> 0...@akka.tcp://flink@trusfortpoc1:23584/user/jobmanager_0
> for job 24691b33c18d7ad73b1f52edb3d68ae4.
>  2020-04-10 19:12:02,917 INFOnbsp;
> org.apache.flink.yarn.YarnResourceManagernbsp; nbsp; nbsp;
> nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;
> nbsp; nbsp;- Registered job manager
> 0...@akka.tcp://flink@trusfortpoc1:23584/user/jobmanager_0
> for job 24691b33c18d7ad73b1f52edb3d68ae4.
>  2020-04-10 19:12:02,919 INFOnbsp;
> org.apache.flink.runtime.jobmaster.JobMasternbsp; nbsp;
> nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;
> nbsp; - JobManager successfully registered at ResourceManager, leader
> id: .
>  2020-04-10 19:12:02,919 INFOnbsp;
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolnbsp; nbsp;
> nbsp; nbsp; nbsp; - Requesting new slot
> [SlotRequestId{35ad2384e9cd0efd30b43f5302db24b6}] and profile
> ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0,
> nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager.
>  2020-04-10 19:12:02,920 INFOnbsp;
> org.apache.flink.yarn.YarnResourceManagernbsp; nbsp; nbsp;
> nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;
> nbsp; nbsp;- Request slot with profile
> ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0,
> nativeMemoryInMB=0, networkMemoryInMB=0} for job
> 24691b33c18d7ad73b1f52edb3d68ae4 with allocation id
> AllocationID{5a12237c7f2bd8b1cc760ddcbab5a1c0}.
>  2020-04-10 19:12:02,921 INFOnbsp;
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolnbsp; nbsp;
> nbsp; nbsp; nbsp; - Requesting new slot
> [SlotRequestId{0feacbb4fe16c8c7a70249f1396565d0}] and profile
> ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0,
> nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager.
>  2020-04-10 19:12:02,924 INFOnbsp;
> org.apache.flink.yarn.YarnResourceManagernbsp; nbsp; nbsp;
> nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;
> nbsp; nbsp;- Requesting new TaskExecutor container with resources
>   2020-04-10 19:12:02,926 INFOnbsp;
> org.apache.flink.yarn.YarnResourceManagernbsp; nbsp; nbsp;
> nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;
> nbsp; nbsp;- Request slot with profile
> ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0,
> nativeMemoryInMB=0, networkMemoryInMB=0} for job
> 24691b33c18d7ad73b1f52edb3d68ae4 with allocation id
> AllocationID{37dd666a18040bf63ffbf2e022b2ea9b}.
>  2020-04-10 19:12:0

?????? flink 1.7.2 YARN Session????????????????????

2020-04-15 文章 Chief
hi Yangze Guo
??checkpoint??namenode
 ha??nameservicecheckpoint




----
??:"Yangze Guo"https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/yarn_setup.html

Best,
Yangze Guo

On Mon, Apr 13, 2020 at 10:52 PM Chief 

Re: flink 1.7.2 YARN Session模式提交任务问题求助

2020-04-15 文章 Yangze Guo
Flink需要设置hadoop相关conf位置的环境变量 YARN_CONF_DIR or HADOOP_CONF_DIR [1]

[1] 
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/yarn_setup.html

Best,
Yangze Guo

On Mon, Apr 13, 2020 at 10:52 PM Chief  wrote:
>
> 大家好
> 目前环境是flink 1.7.2,使用YARN Session模式提交任务,Hadoop 版本2.7.3,hdfs 
> namenode配置了ha模式,提交任务的时候报以下错误,系统环境变量中已经设置了HADOOP_HOME,YARN_CONF_DIR,HADOOP_CONF_DIR,HADOOP_CLASSPATH,在flink_conf.yaml中配置了fs.hdfs.hadoopconf
>
>
> 2020-04-10 19:12:02,908 INFO 
> org.apache.flink.runtime.jobmaster.JobMaster
>  - Connecting to ResourceManager 
> akka.tcp://flink@trusfortpoc1:23584/user/resourcemanager()
> 2020-04-10 19:12:02,909 INFO 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPool   
>   - Cannot serve slot request, no ResourceManager connected. 
> Adding as pending request [SlotRequestId{0feacbb4fe16c8c7a70249f1396565d0}]
> 2020-04-10 19:12:02,911 INFO 
> org.apache.flink.runtime.jobmaster.JobMaster
>  - Resolved ResourceManager address, 
> beginning registration
> 2020-04-10 19:12:02,911 INFO 
> org.apache.flink.runtime.jobmaster.JobMaster
>  - Registration at ResourceManager attempt 
> 1 (timeout=100ms)
> 2020-04-10 19:12:02,912 INFO 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPool   
>   - Cannot serve slot request, no ResourceManager connected. 
> Adding as pending request [SlotRequestId{35ad2384e9cd0efd30b43f5302db24b6}]
> 2020-04-10 19:12:02,913 INFO 
> org.apache.flink.yarn.YarnResourceManager 
>  - Registering job manager 
> 0...@akka.tcp://flink@trusfortpoc1:23584/user/jobmanager_0
>  for job 24691b33c18d7ad73b1f52edb3d68ae4.
> 2020-04-10 19:12:02,917 INFO 
> org.apache.flink.yarn.YarnResourceManager 
>  - Registered job manager 
> 0...@akka.tcp://flink@trusfortpoc1:23584/user/jobmanager_0
>  for job 24691b33c18d7ad73b1f52edb3d68ae4.
> 2020-04-10 19:12:02,919 INFO 
> org.apache.flink.runtime.jobmaster.JobMaster
>  - JobManager successfully registered at 
> ResourceManager, leader id: .
> 2020-04-10 19:12:02,919 INFO 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPool   
>   - Requesting new slot 
> [SlotRequestId{35ad2384e9cd0efd30b43f5302db24b6}] and profile 
> ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, 
> nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager.
> 2020-04-10 19:12:02,920 INFO 
> org.apache.flink.yarn.YarnResourceManager 
>  - Request slot with profile 
> ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, 
> nativeMemoryInMB=0, networkMemoryInMB=0} for job 
> 24691b33c18d7ad73b1f52edb3d68ae4 with allocation id 
> AllocationID{5a12237c7f2bd8b1cc760ddcbab5a1c0}.
> 2020-04-10 19:12:02,921 INFO 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPool   
>   - Requesting new slot 
> [SlotRequestId{0feacbb4fe16c8c7a70249f1396565d0}] and profile 
> ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, 
> nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager.
> 2020-04-10 19:12:02,924 INFO 
> org.apache.flink.yarn.YarnResourceManager 
>  - Requesting new TaskExecutor 
> container with resources  1.
> 2020-04-10 19:12:02,926 INFO 
> org.apache.flink.yarn.YarnResourceManager 
>  - Request slot with profile 
> ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, 
> nativeMemoryInMB=0, networkMemoryInMB=0} for job 
> 24691b33c18d7ad73b1f52edb3d68ae4 with allocation id 
> AllocationID{37dd666a18040bf63ffbf2e022b2ea9b}.
> 2020-04-10 19:12:06,531 INFO 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl   
>  - Received new token for : trusfortpoc3:35206
> 2020-04-10 19:12:06,543 INFO 
> org.apache.flink.yarn.YarnResourceManager 
>  - Received new container: 
> container_1586426824930_0006_01_02 - Remaining pending container 
> requests: 1
> 2020-04-10 19:12:06,543 INFO 
> org.apache.flink.yarn.YarnResourceManager 
>  - Removing container request 
> Capability[ 0.
> 2020-04-10 19:12:06,568 ERROR org.apache.flink.yarn.YarnResourceManager 
>  - Could 
> not start TaskManager in container container_1586426824930_0006_01_02.
> java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> hdfsClusterForML
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:320)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
> at org.apache.hadoop.hdfs.DFSClient. at org.apache.hadoop.hdfs.DFSClient. at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
> at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at 
> 

flink 1.7.2 YARN Session????????????????????

2020-04-13 文章 Chief
??
??flink 1.7.2??YARN Session??Hadoop 2.7.3??hdfs 
namenode??haHADOOP_HOME,YARN_CONF_DIR??HADOOP_CONF_DIR??HADOOP_CLASSPATHflink_conf.yamlfs.hdfs.hadoopconf


2020-04-10 19:12:02,908 INFO 
org.apache.flink.runtime.jobmaster.JobMaster 
- Connecting to ResourceManager 
akka.tcp://flink@trusfortpoc1:23584/user/resourcemanager()
2020-04-10 19:12:02,909 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPool
 - Cannot serve slot request, no ResourceManager connected. Adding as 
pending request [SlotRequestId{0feacbb4fe16c8c7a70249f1396565d0}]
2020-04-10 19:12:02,911 INFO 
org.apache.flink.runtime.jobmaster.JobMaster 
- Resolved ResourceManager address, beginning 
registration
2020-04-10 19:12:02,911 INFO 
org.apache.flink.runtime.jobmaster.JobMaster 
- Registration at ResourceManager attempt 1 
(timeout=100ms)
2020-04-10 19:12:02,912 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPool
 - Cannot serve slot request, no ResourceManager connected. Adding as 
pending request [SlotRequestId{35ad2384e9cd0efd30b43f5302db24b6}]
2020-04-10 19:12:02,913 INFO 
org.apache.flink.yarn.YarnResourceManager 
 - Registering job manager 
0...@akka.tcp://flink@trusfortpoc1:23584/user/jobmanager_0
 for job 24691b33c18d7ad73b1f52edb3d68ae4.
2020-04-10 19:12:02,917 INFO 
org.apache.flink.yarn.YarnResourceManager 
 - Registered job manager 
0...@akka.tcp://flink@trusfortpoc1:23584/user/jobmanager_0
 for job 24691b33c18d7ad73b1f52edb3d68ae4.
2020-04-10 19:12:02,919 INFO 
org.apache.flink.runtime.jobmaster.JobMaster 
- JobManager successfully registered at 
ResourceManager, leader id: .
2020-04-10 19:12:02,919 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPool
 - Requesting new slot [SlotRequestId{35ad2384e9cd0efd30b43f5302db24b6}] 
and profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, 
directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} from resource 
manager.
2020-04-10 19:12:02,920 INFO 
org.apache.flink.yarn.YarnResourceManager 
 - Request slot with profile 
ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, 
nativeMemoryInMB=0, networkMemoryInMB=0} for job 
24691b33c18d7ad73b1f52edb3d68ae4 with allocation id 
AllocationID{5a12237c7f2bd8b1cc760ddcbab5a1c0}.
2020-04-10 19:12:02,921 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPool
 - Requesting new slot [SlotRequestId{0feacbb4fe16c8c7a70249f1396565d0}] 
and profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, 
directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} from resource 
manager.
2020-04-10 19:12:02,924 INFO 
org.apache.flink.yarn.YarnResourceManager 
 - Requesting new TaskExecutor 
container with resources 

flink 1.7.2 YARN Session????????????????????

2020-04-13 文章 Chief
??
??flink 1.7.2??YARN Session??Hadoop 2.7.3??hdfs 
namenode??haHADOOP_HOME,YARN_CONF_DIR??HADOOP_CONF_DIR??HADOOP_CLASSPATHflink_conf.yamlfs.hdfs.hadoopconf


2020-04-10 19:12:02,908 INFO 
org.apache.flink.runtime.jobmaster.JobMaster 
- Connecting to ResourceManager 
akka.tcp://flink@trusfortpoc1:23584/user/resourcemanager()
2020-04-10 19:12:02,909 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPool
 - Cannot serve slot request, no ResourceManager connected. Adding as 
pending request [SlotRequestId{0feacbb4fe16c8c7a70249f1396565d0}]
2020-04-10 19:12:02,911 INFO 
org.apache.flink.runtime.jobmaster.JobMaster 
- Resolved ResourceManager address, beginning 
registration
2020-04-10 19:12:02,911 INFO 
org.apache.flink.runtime.jobmaster.JobMaster 
- Registration at ResourceManager attempt 1 
(timeout=100ms)
2020-04-10 19:12:02,912 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPool
 - Cannot serve slot request, no ResourceManager connected. Adding as 
pending request [SlotRequestId{35ad2384e9cd0efd30b43f5302db24b6}]
2020-04-10 19:12:02,913 INFO 
org.apache.flink.yarn.YarnResourceManager 
 - Registering job manager 
0...@akka.tcp://flink@trusfortpoc1:23584/user/jobmanager_0
 for job 24691b33c18d7ad73b1f52edb3d68ae4.
2020-04-10 19:12:02,917 INFO 
org.apache.flink.yarn.YarnResourceManager 
 - Registered job manager 
0...@akka.tcp://flink@trusfortpoc1:23584/user/jobmanager_0
 for job 24691b33c18d7ad73b1f52edb3d68ae4.
2020-04-10 19:12:02,919 INFO 
org.apache.flink.runtime.jobmaster.JobMaster 
- JobManager successfully registered at 
ResourceManager, leader id: .
2020-04-10 19:12:02,919 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPool
 - Requesting new slot [SlotRequestId{35ad2384e9cd0efd30b43f5302db24b6}] 
and profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, 
directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} from resource 
manager.
2020-04-10 19:12:02,920 INFO 
org.apache.flink.yarn.YarnResourceManager 
 - Request slot with profile 
ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, 
nativeMemoryInMB=0, networkMemoryInMB=0} for job 
24691b33c18d7ad73b1f52edb3d68ae4 with allocation id 
AllocationID{5a12237c7f2bd8b1cc760ddcbab5a1c0}.
2020-04-10 19:12:02,921 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPool
 - Requesting new slot [SlotRequestId{0feacbb4fe16c8c7a70249f1396565d0}] 
and profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, 
directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} from resource 
manager.
2020-04-10 19:12:02,924 INFO 
org.apache.flink.yarn.YarnResourceManager 
 - Requesting new TaskExecutor 
container with resources