[jira] [Commented] (YARN-7884) Race condition in registering YARN service in ZooKeeper
[ https://issues.apache.org/jira/browse/YARN-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802688#comment-17802688 ] Shilun Fan commented on YARN-7884: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Race condition in registering YARN service in ZooKeeper > --- > > Key: YARN-7884 > URL: https://issues.apache.org/jira/browse/YARN-7884 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Priority: Major > > In Kerberos enabled cluster, there seems to be a race condition for > registering YARN service. > Yarn-service znode creation seems to happen after AM started and reporting > back to update components information. For some reason, Yarnservice znode > should have access to create the znode, but reported NoAuth. > {code} > 2018-02-02 22:53:30,442 [main] INFO service.ServiceScheduler - Set registry > user accounts: sasl:hbase > 2018-02-02 22:53:30,471 [main] INFO zk.RegistrySecurity - Registry default > system acls: > [1,s{'world,'anyone} > , 31,s{'sasl,'yarn} > , 31,s{'sasl,'jhs} > , 31,s{'sasl,'hdfs-demo} > , 31,s{'sasl,'rm} > , 31,s{'sasl,'hive} > ] > 2018-02-02 22:53:30,472 [main] INFO zk.RegistrySecurity - Registry User ACLs > [31,s{'sasl,'hbase} > , 31,s{'sasl,'hbase} > ] > 2018-02-02 22:53:30,503 [main] INFO event.AsyncDispatcher - Registering > class org.apache.hadoop.yarn.service.component.ComponentEventType for class > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler > 2018-02-02 22:53:30,504 [main] INFO event.AsyncDispatcher - Registering > class > org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType > for class > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler > 2018-02-02 22:53:30,528 [main] INFO impl.NMClientAsyncImpl - Upper bound of > the thread pool size is 500 > 2018-02-02 22:53:30,531 [main] INFO service.ServiceMaster - Starting service > as user hbase/eyang-5.openstacklo...@example.com (auth:KERBEROS) > 2018-02-02 22:53:30,545 [main] INFO ipc.CallQueueManager - Using callQueue: > class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: > class org.apache.hadoop.ipc.DefaultRpcScheduler > 2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO ipc.Server - > Starting Socket Reader #1 for port 56859 > 2018-02-02 22:53:30,589 [main] INFO pb.RpcServerFactoryPBImpl - Adding > protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to > the server > 2018-02-02 22:53:30,606 [IPC Server Responder] INFO ipc.Server - IPC Server > Responder: starting > 2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO ipc.Server - IPC > Server listener on 56859: starting > 2018-02-02 22:53:30,607 [main] INFO service.ClientAMService - Instantiated > ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859 > 2018-02-02 22:53:30,609 [main] INFO zk.CuratorService - Creating > CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181" > 2018-02-02 22:53:30,615 [main] INFO zk.RegistrySecurity - Enabling ZK sasl > client: jaasClientEntry = Client, principal = > hbase/eyang-5.openstacklo...@example.com, keytab = > /etc/security/keytabs/hbase.service.keytab > 2018-02-02 22:53:30,752 [main] INFO client.RMProxy - Connecting to > ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032 > 2018-02-02 22:53:30,909 [main] INFO service.ServiceScheduler - Registering > appattempt_1517611904996_0001_01, abc into registry > 2018-02-02 22:53:30,911 [main] INFO service.ServiceScheduler - Received 0 > containers from previous attempt. > 2018-02-02 22:53:31,072 [main] INFO service.ServiceScheduler - Could not > read component paths: `/users/hbase/services/yarn-service/abc/components': No > such file or directory: KeeperErrorCode = NoNode for > /registry/users/hbase/services/yarn-service/abc/components > 2018-02-02 22:53:31,074 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component sleeper > 2018-02-02 22:53:31,075 [main] INFO component.Component - [INIT COMPONENT > sleeper]: 2 instances. > 2018-02-02 22:53:31,094 [main] INFO component.Component - [COMPONENT > sleeper] Transitioned from INIT to FLEXING on FLEX event. > 2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - > Failed to register app abc in registry > org.apache.hadoop.registry.client.exceptions.NoPathPermissionsException: > `/registry/users/hbase/services/yarn-service/abc': Not authorized to access > path; ACLs: [ > 0x01: 'world,'anyone > 0x1f: 'sasl,'yarn > 0x1f: 'sasl,'jhs > 0x1f: 'sasl,'hdfs-demo > 0x1f: 'sasl,'rm > 0x1f: 'sasl,'hive > 0x1f: 'sasl,'hbase >
[jira] [Commented] (YARN-7884) Race condition in registering YARN service in ZooKeeper
[ https://issues.apache.org/jira/browse/YARN-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351175#comment-16351175 ] Eric Yang commented on YARN-7884: - ServiceScheduler sets znode /registry/users/hbase/services/yarn-service to {code:java} 'world,'anyone : r 'sasl,'yarn : cdrwa 'sasl,'rm : cdrwa 'sasl,'hbase : cdrwa{code} For some reason, the world:anyone:r permission is injected to yarn-service node, and prevents the child node to be written. In theory, the evaluation of sasl:rm or sasl:hbase should allow the child node to be written, but this is not happening. > Race condition in registering YARN service in ZooKeeper > --- > > Key: YARN-7884 > URL: https://issues.apache.org/jira/browse/YARN-7884 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Priority: Major > > In Kerberos enabled cluster, there seems to be a race condition for > registering YARN service. > Yarn-service znode creation seems to happen after AM started and reporting > back to update components information. For some reason, Yarnservice znode > should have access to create the znode, but reported NoAuth. > {code} > 2018-02-02 22:53:30,442 [main] INFO service.ServiceScheduler - Set registry > user accounts: sasl:hbase > 2018-02-02 22:53:30,471 [main] INFO zk.RegistrySecurity - Registry default > system acls: > [1,s{'world,'anyone} > , 31,s{'sasl,'yarn} > , 31,s{'sasl,'jhs} > , 31,s{'sasl,'hdfs-demo} > , 31,s{'sasl,'rm} > , 31,s{'sasl,'hive} > ] > 2018-02-02 22:53:30,472 [main] INFO zk.RegistrySecurity - Registry User ACLs > [31,s{'sasl,'hbase} > , 31,s{'sasl,'hbase} > ] > 2018-02-02 22:53:30,503 [main] INFO event.AsyncDispatcher - Registering > class org.apache.hadoop.yarn.service.component.ComponentEventType for class > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler > 2018-02-02 22:53:30,504 [main] INFO event.AsyncDispatcher - Registering > class > org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType > for class > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler > 2018-02-02 22:53:30,528 [main] INFO impl.NMClientAsyncImpl - Upper bound of > the thread pool size is 500 > 2018-02-02 22:53:30,531 [main] INFO service.ServiceMaster - Starting service > as user hbase/eyang-5.openstacklo...@example.com (auth:KERBEROS) > 2018-02-02 22:53:30,545 [main] INFO ipc.CallQueueManager - Using callQueue: > class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: > class org.apache.hadoop.ipc.DefaultRpcScheduler > 2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO ipc.Server - > Starting Socket Reader #1 for port 56859 > 2018-02-02 22:53:30,589 [main] INFO pb.RpcServerFactoryPBImpl - Adding > protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to > the server > 2018-02-02 22:53:30,606 [IPC Server Responder] INFO ipc.Server - IPC Server > Responder: starting > 2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO ipc.Server - IPC > Server listener on 56859: starting > 2018-02-02 22:53:30,607 [main] INFO service.ClientAMService - Instantiated > ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859 > 2018-02-02 22:53:30,609 [main] INFO zk.CuratorService - Creating > CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181" > 2018-02-02 22:53:30,615 [main] INFO zk.RegistrySecurity - Enabling ZK sasl > client: jaasClientEntry = Client, principal = > hbase/eyang-5.openstacklo...@example.com, keytab = > /etc/security/keytabs/hbase.service.keytab > 2018-02-02 22:53:30,752 [main] INFO client.RMProxy - Connecting to > ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032 > 2018-02-02 22:53:30,909 [main] INFO service.ServiceScheduler - Registering > appattempt_1517611904996_0001_01, abc into registry > 2018-02-02 22:53:30,911 [main] INFO service.ServiceScheduler - Received 0 > containers from previous attempt. > 2018-02-02 22:53:31,072 [main] INFO service.ServiceScheduler - Could not > read component paths: `/users/hbase/services/yarn-service/abc/components': No > such file or directory: KeeperErrorCode = NoNode for > /registry/users/hbase/services/yarn-service/abc/components > 2018-02-02 22:53:31,074 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component sleeper > 2018-02-02 22:53:31,075 [main] INFO component.Component - [INIT COMPONENT > sleeper]: 2 instances. > 2018-02-02 22:53:31,094 [main] INFO component.Component - [COMPONENT > sleeper] Transitioned from INIT to FLEXING on FLEX event. > 2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - > Failed to register app abc in registry >