老师们好: 请教一个问题, 由于hadoop Delegation token 会在超过Max Lifetime(默认7天)后过期清除,对于长期运行任务,yarn提到有三种策略解决这个问题:https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md#securing-long-lived-yarn-services
想知道flink on yarn是如何解决hadoop Delegation token 过期的呢?看官网似乎说得不够清楚 目前在生产环境遇到了如下故障: flink 1.12 on yarn,yarn的nodemanager是容器化部署的,nodemanager偶尔会挂掉重启。当flink 任务运行超过7天后,若某个flink任务的JM(am)所在的nodemanager重启,am会进行attempt(attempt时获取的是任务提交时的1377****这个token,但这个token已经从namenode清除了),但attempt失败,失败原因为: Failing this attempt.Diagnostics: token (HDFS_DELEGATION_TOKEN token 1377**** for user***) can't be found in cache org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 1377****for user***) can't be found in cache 疑问: flink on yarn在HADOOP Delegation token清除后,是如何更新的呢?是生成了新的token吗? 如果生成了新的token,为何am attempt 时,还会继续获取已清除的这个token(1377****) 这个故障是否和nodemanager容器化部署有关?nodemanager重启后,因为保存keytab的相关文件被清除了?