[jira] [Commented] (YARN-7884) Race condition in registering YARN service in ZooKeeper

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802688#comment-17802688
 ] 

Shilun Fan commented on YARN-7884:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Race condition in registering YARN service in ZooKeeper
> ---
>
> Key: YARN-7884
> URL: https://issues.apache.org/jira/browse/YARN-7884
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Priority: Major
>
> In Kerberos enabled cluster, there seems to be a race condition for 
> registering YARN service.
> Yarn-service znode creation seems to happen after AM started and reporting 
> back to update components information.  For some reason, Yarnservice znode 
> should have access to create the znode, but reported NoAuth.
> {code}
> 2018-02-02 22:53:30,442 [main] INFO  service.ServiceScheduler - Set registry 
> user accounts: sasl:hbase
> 2018-02-02 22:53:30,471 [main] INFO  zk.RegistrySecurity - Registry default 
> system acls: 
> [1,s{'world,'anyone}
> , 31,s{'sasl,'yarn}
> , 31,s{'sasl,'jhs}
> , 31,s{'sasl,'hdfs-demo}
> , 31,s{'sasl,'rm}
> , 31,s{'sasl,'hive}
> ]
> 2018-02-02 22:53:30,472 [main] INFO  zk.RegistrySecurity - Registry User ACLs 
> [31,s{'sasl,'hbase}
> , 31,s{'sasl,'hbase}
> ]
> 2018-02-02 22:53:30,503 [main] INFO  event.AsyncDispatcher - Registering 
> class org.apache.hadoop.yarn.service.component.ComponentEventType for class 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler
> 2018-02-02 22:53:30,504 [main] INFO  event.AsyncDispatcher - Registering 
> class 
> org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType 
> for class 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler
> 2018-02-02 22:53:30,528 [main] INFO  impl.NMClientAsyncImpl - Upper bound of 
> the thread pool size is 500
> 2018-02-02 22:53:30,531 [main] INFO  service.ServiceMaster - Starting service 
> as user hbase/eyang-5.openstacklo...@example.com (auth:KERBEROS)
> 2018-02-02 22:53:30,545 [main] INFO  ipc.CallQueueManager - Using callQueue: 
> class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: 
> class org.apache.hadoop.ipc.DefaultRpcScheduler
> 2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO  ipc.Server - 
> Starting Socket Reader #1 for port 56859
> 2018-02-02 22:53:30,589 [main] INFO  pb.RpcServerFactoryPBImpl - Adding 
> protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to 
> the server
> 2018-02-02 22:53:30,606 [IPC Server Responder] INFO  ipc.Server - IPC Server 
> Responder: starting
> 2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO  ipc.Server - IPC 
> Server listener on 56859: starting
> 2018-02-02 22:53:30,607 [main] INFO  service.ClientAMService - Instantiated 
> ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859
> 2018-02-02 22:53:30,609 [main] INFO  zk.CuratorService - Creating 
> CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181" 
> 2018-02-02 22:53:30,615 [main] INFO  zk.RegistrySecurity - Enabling ZK sasl 
> client: jaasClientEntry = Client, principal = 
> hbase/eyang-5.openstacklo...@example.com, keytab = 
> /etc/security/keytabs/hbase.service.keytab
> 2018-02-02 22:53:30,752 [main] INFO  client.RMProxy - Connecting to 
> ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032
> 2018-02-02 22:53:30,909 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1517611904996_0001_01, abc into registry
> 2018-02-02 22:53:30,911 [main] INFO  service.ServiceScheduler - Received 0 
> containers from previous attempt.
> 2018-02-02 22:53:31,072 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: `/users/hbase/services/yarn-service/abc/components': No 
> such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hbase/services/yarn-service/abc/components
> 2018-02-02 22:53:31,074 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component sleeper
> 2018-02-02 22:53:31,075 [main] INFO  component.Component - [INIT COMPONENT 
> sleeper]: 2 instances.
> 2018-02-02 22:53:31,094 [main] INFO  component.Component - [COMPONENT 
> sleeper] Transitioned from INIT to FLEXING on FLEX event.
> 2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - 
> Failed to register app abc in registry
> org.apache.hadoop.registry.client.exceptions.NoPathPermissionsException: 
> `/registry/users/hbase/services/yarn-service/abc': Not authorized to access 
> path; ACLs: [
> 0x01: 'world,'anyone
>  0x1f: 'sasl,'yarn
>  0x1f: 'sasl,'jhs
>  0x1f: 'sasl,'hdfs-demo
>  0x1f: 'sasl,'rm
>  0x1f: 'sasl,'hive
>  0x1f: 'sasl,'hbase
>  

[jira] [Commented] (YARN-7884) Race condition in registering YARN service in ZooKeeper

2018-02-02 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351175#comment-16351175
 ] 

Eric Yang commented on YARN-7884:
-

ServiceScheduler sets znode /registry/users/hbase/services/yarn-service to 
{code:java}
'world,'anyone
: r
'sasl,'yarn
: cdrwa
'sasl,'rm
: cdrwa
'sasl,'hbase
: cdrwa{code}
For some reason, the world:anyone:r permission is injected to yarn-service 
node, and prevents the child node to be written.

In theory, the evaluation of sasl:rm or sasl:hbase should allow the child node 
to be written, but this is not happening.

 

> Race condition in registering YARN service in ZooKeeper
> ---
>
> Key: YARN-7884
> URL: https://issues.apache.org/jira/browse/YARN-7884
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Priority: Major
>
> In Kerberos enabled cluster, there seems to be a race condition for 
> registering YARN service.
> Yarn-service znode creation seems to happen after AM started and reporting 
> back to update components information.  For some reason, Yarnservice znode 
> should have access to create the znode, but reported NoAuth.
> {code}
> 2018-02-02 22:53:30,442 [main] INFO  service.ServiceScheduler - Set registry 
> user accounts: sasl:hbase
> 2018-02-02 22:53:30,471 [main] INFO  zk.RegistrySecurity - Registry default 
> system acls: 
> [1,s{'world,'anyone}
> , 31,s{'sasl,'yarn}
> , 31,s{'sasl,'jhs}
> , 31,s{'sasl,'hdfs-demo}
> , 31,s{'sasl,'rm}
> , 31,s{'sasl,'hive}
> ]
> 2018-02-02 22:53:30,472 [main] INFO  zk.RegistrySecurity - Registry User ACLs 
> [31,s{'sasl,'hbase}
> , 31,s{'sasl,'hbase}
> ]
> 2018-02-02 22:53:30,503 [main] INFO  event.AsyncDispatcher - Registering 
> class org.apache.hadoop.yarn.service.component.ComponentEventType for class 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler
> 2018-02-02 22:53:30,504 [main] INFO  event.AsyncDispatcher - Registering 
> class 
> org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType 
> for class 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler
> 2018-02-02 22:53:30,528 [main] INFO  impl.NMClientAsyncImpl - Upper bound of 
> the thread pool size is 500
> 2018-02-02 22:53:30,531 [main] INFO  service.ServiceMaster - Starting service 
> as user hbase/eyang-5.openstacklo...@example.com (auth:KERBEROS)
> 2018-02-02 22:53:30,545 [main] INFO  ipc.CallQueueManager - Using callQueue: 
> class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: 
> class org.apache.hadoop.ipc.DefaultRpcScheduler
> 2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO  ipc.Server - 
> Starting Socket Reader #1 for port 56859
> 2018-02-02 22:53:30,589 [main] INFO  pb.RpcServerFactoryPBImpl - Adding 
> protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to 
> the server
> 2018-02-02 22:53:30,606 [IPC Server Responder] INFO  ipc.Server - IPC Server 
> Responder: starting
> 2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO  ipc.Server - IPC 
> Server listener on 56859: starting
> 2018-02-02 22:53:30,607 [main] INFO  service.ClientAMService - Instantiated 
> ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859
> 2018-02-02 22:53:30,609 [main] INFO  zk.CuratorService - Creating 
> CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181" 
> 2018-02-02 22:53:30,615 [main] INFO  zk.RegistrySecurity - Enabling ZK sasl 
> client: jaasClientEntry = Client, principal = 
> hbase/eyang-5.openstacklo...@example.com, keytab = 
> /etc/security/keytabs/hbase.service.keytab
> 2018-02-02 22:53:30,752 [main] INFO  client.RMProxy - Connecting to 
> ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032
> 2018-02-02 22:53:30,909 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1517611904996_0001_01, abc into registry
> 2018-02-02 22:53:30,911 [main] INFO  service.ServiceScheduler - Received 0 
> containers from previous attempt.
> 2018-02-02 22:53:31,072 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: `/users/hbase/services/yarn-service/abc/components': No 
> such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hbase/services/yarn-service/abc/components
> 2018-02-02 22:53:31,074 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component sleeper
> 2018-02-02 22:53:31,075 [main] INFO  component.Component - [INIT COMPONENT 
> sleeper]: 2 instances.
> 2018-02-02 22:53:31,094 [main] INFO  component.Component - [COMPONENT 
> sleeper] Transitioned from INIT to FLEXING on FLEX event.
> 2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - 
> Failed to register app abc in registry
>