Eric Yang created YARN-7884:
-------------------------------

             Summary: Race condition in registering YARN service in ZooKeeper
                 Key: YARN-7884
                 URL: https://issues.apache.org/jira/browse/YARN-7884
             Project: Hadoop YARN
          Issue Type: Bug
          Components: yarn-native-services
    Affects Versions: 3.1.0
            Reporter: Eric Yang


In Kerberos enabled cluster, there seems to be a race condition for registering 
YARN service.

Yarn-service znode creation seems to happen after AM started and reporting back 
to update components information.  For some reason, Yarnservice znode should 
have access to create the znode, but reported NoAuth.

{code}
2018-02-02 22:53:30,442 [main] INFO  service.ServiceScheduler - Set registry 
user accounts: sasl:hbase
2018-02-02 22:53:30,471 [main] INFO  zk.RegistrySecurity - Registry default 
system acls: 
[1,s{'world,'anyone}
, 31,s{'sasl,'yarn}
, 31,s{'sasl,'jhs}
, 31,s{'sasl,'hdfs-demo}
, 31,s{'sasl,'rm}
, 31,s{'sasl,'hive}
]
2018-02-02 22:53:30,472 [main] INFO  zk.RegistrySecurity - Registry User ACLs 
[31,s{'sasl,'hbase}
, 31,s{'sasl,'hbase}
]
2018-02-02 22:53:30,503 [main] INFO  event.AsyncDispatcher - Registering class 
org.apache.hadoop.yarn.service.component.ComponentEventType for class 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler
2018-02-02 22:53:30,504 [main] INFO  event.AsyncDispatcher - Registering class 
org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType 
for class 
org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler
2018-02-02 22:53:30,528 [main] INFO  impl.NMClientAsyncImpl - Upper bound of 
the thread pool size is 500
2018-02-02 22:53:30,531 [main] INFO  service.ServiceMaster - Starting service 
as user hbase/eyang-5.openstacklo...@example.com (auth:KERBEROS)
2018-02-02 22:53:30,545 [main] INFO  ipc.CallQueueManager - Using callQueue: 
class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: 
class org.apache.hadoop.ipc.DefaultRpcScheduler
2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO  ipc.Server - 
Starting Socket Reader #1 for port 56859
2018-02-02 22:53:30,589 [main] INFO  pb.RpcServerFactoryPBImpl - Adding 
protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to 
the server
2018-02-02 22:53:30,606 [IPC Server Responder] INFO  ipc.Server - IPC Server 
Responder: starting
2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO  ipc.Server - IPC 
Server listener on 56859: starting
2018-02-02 22:53:30,607 [main] INFO  service.ClientAMService - Instantiated 
ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859
2018-02-02 22:53:30,609 [main] INFO  zk.CuratorService - Creating 
CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181" 
2018-02-02 22:53:30,615 [main] INFO  zk.RegistrySecurity - Enabling ZK sasl 
client: jaasClientEntry = Client, principal = 
hbase/eyang-5.openstacklo...@example.com, keytab = 
/etc/security/keytabs/hbase.service.keytab
2018-02-02 22:53:30,752 [main] INFO  client.RMProxy - Connecting to 
ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032
2018-02-02 22:53:30,909 [main] INFO  service.ServiceScheduler - Registering 
appattempt_1517611904996_0001_000001, abc into registry
2018-02-02 22:53:30,911 [main] INFO  service.ServiceScheduler - Received 0 
containers from previous attempt.
2018-02-02 22:53:31,072 [main] INFO  service.ServiceScheduler - Could not read 
component paths: `/users/hbase/services/yarn-service/abc/components': No such 
file or directory: KeeperErrorCode = NoNode for 
/registry/users/hbase/services/yarn-service/abc/components
2018-02-02 22:53:31,074 [main] INFO  service.ServiceScheduler - Triggering 
initial evaluation of component sleeper
2018-02-02 22:53:31,075 [main] INFO  component.Component - [INIT COMPONENT 
sleeper]: 2 instances.
2018-02-02 22:53:31,094 [main] INFO  component.Component - [COMPONENT sleeper] 
Transitioned from INIT to FLEXING on FLEX event.
2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - 
Failed to register app abc in registry
org.apache.hadoop.registry.client.exceptions.NoPathPermissionsException: 
`/registry/users/hbase/services/yarn-service/abc': Not authorized to access 
path; ACLs: [
0x01: 'world,'anyone
 0x1f: 'sasl,'yarn
 0x1f: 'sasl,'jhs
 0x1f: 'sasl,'hdfs-demo
 0x1f: 'sasl,'rm
 0x1f: 'sasl,'hive
 0x1f: 'sasl,'hbase
 0x1f: 'sasl,'hbase
 ]: KeeperErrorCode = NoAuth for /registry/users/hbase/services/yarn-service/abc
        at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.operationFailure(CuratorService.java:412)
        at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.zkCreate(CuratorService.java:637)
        at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.zkSet(CuratorService.java:679)
        at 
org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.bind(RegistryOperationsService.java:116)
        at 
org.apache.hadoop.yarn.service.registry.YarnRegistryViewForProviders.putService(YarnRegistryViewForProviders.java:195)
        at 
org.apache.hadoop.yarn.service.registry.YarnRegistryViewForProviders.registerSelf(YarnRegistryViewForProviders.java:210)
        at 
org.apache.hadoop.yarn.service.ServiceScheduler$2.run(ServiceScheduler.java:462)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.zookeeper.KeeperException$NoAuthException: 
KeeperErrorCode = NoAuth for /registry/users/hbase/services/yarn-service/abc
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:740)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:723)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:720)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:484)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:474)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:260)
        at 
org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:214)
        at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.zkCreate(CuratorService.java:635)
        ... 12 more
2018-02-02 22:53:33,135 [AMRM Callback Handler Thread] INFO  
service.ServiceScheduler - 2 containers allocated. 
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to