这个问题暂停段时间,这部分比较复杂。可能还涉及到自定义的scheduler,以及自定义的hadoop鉴权方式等。目前我也不是很清楚还,还需要继续问问公司相关基础设施的同学。

Yang Wang <[email protected]> 于2020年8月25日周二 上午11:21写道:

>
> 你确认upd_security这个queue是存在的吧,另外你Yarn集群的scheduler是capacityScheduler还是FairScheduler
> 如果是Fair的话,需要指定完整的queue名字,而不是叶子节点的
>
>
> Best,
> Yang
>
> 赵一旦 <[email protected]> 于2020年8月24日周一 上午10:55写道:
>
> > 比如今天尝试了一波命令:./bin/yarn-session.sh -nm test_flink -q -qu upd_security -s 1
> > -tm 3024MB -jm 3024MB
> > 同时我设置了 export HADOOP_USER_NAME=xxx
> > ,这个在启动的时候会看到日志:org.apache.flink.runtime.security.modules.HadoopModule  -
> > Hadoop user set to upd_security (auth:SIMPLE)。
> >
> > 然后报错:
> >
> > 2020-08-24 10:52:31 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli
> -
> > Error while running the Flink session.
> > java.lang.RuntimeException: Couldn't get cluster description
> >         at
> >
> >
> org.apache.flink.yarn.YarnClusterDescriptor.getClusterDescription(YarnClusterDescriptor.java:1254)
> >         at
> >
> >
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:534)
> >         at
> >
> >
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$5(FlinkYarnSessionCli.java:785)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:422)
> >         at
> >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
> >         at
> >
> >
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
> >         at
> >
> >
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:785)
> > Caused by: java.lang.NullPointerException: null
> >         at
> >
> >
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getChildQueues(YarnClientImpl.java:587)
> >         at
> >
> >
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getAllQueues(YarnClientImpl.java:557)
> >         at
> >
> >
> org.apache.flink.yarn.YarnClusterDescriptor.getClusterDescription(YarnClusterDescriptor.java:1247)
> >         ... 7 common frames omitted
> >
> > ------------------------------------------------------------
> >  The program finished with the following exception:
> >
> > java.lang.RuntimeException: Couldn't get cluster description
> >         at
> >
> >
> org.apache.flink.yarn.YarnClusterDescriptor.getClusterDescription(YarnClusterDescriptor.java:1254)
> >         at
> >
> >
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:534)
> >         at
> >
> >
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$5(FlinkYarnSessionCli.java:785)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:422)
> >         at
> >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
> >         at
> >
> >
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
> >         at
> >
> >
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:785)
> > Caused by: java.lang.NullPointerException
> >         at
> >
> >
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getChildQueues(YarnClientImpl.java:587)
> >         at
> >
> >
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getAllQueues(YarnClientImpl.java:557)
> >         at
> >
> >
> org.apache.flink.yarn.YarnClusterDescriptor.getClusterDescription(YarnClusterDescriptor.java:1247)
> >         ... 7 more
> >
> >
> >
> >
> >
> > caozhen <[email protected]> 于2020年8月24日周一 上午10:00写道:
> >
> > > 报错是 AM申请资源时vcore不够
> > >
> > > 1、可以确认当前队列是否有足够的vcore
> > > 2、确认当前队列允许允许的最大application数
> > >
> > > 我之前遇到这个问题是队列没有配置好资源导致
> > >
> > >
> > >
> > > --
> > > Sent from: http://apache-flink.147419.n8.nabble.com/
> >
>

回复