Hi all, I'm currently running LLAP on kerberized HDP cluster (v 3.1.0-78), somehow when I run this query:
use default; CREATE TABLE IF NOT EXISTS employee ( eid int, name String, salary String, destination String) COMMENT 'Employee details' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE; INSERT INTO employee VALUES (1, 'toto', 'toto', 'toto'), (2, 'titi', 'titi', 'titi'); select * from employee; It fails on the INSERT statement, the DAGAppMaster seems in an unconfigured context. Because in LlapTaskCommunicator.java, this check fails line 165 on kerberized cluster: Preconditions.checkState((token != null) == UserGroupInformation.isSecurityEnabled()); In fact I logged UserGroupInformation.isSecurityEnabled and it returns False, so if I add : UserGroupInformation.setConfiguration(conf); Preconditions.checkState((token != null) == UserGroupInformation.isSecurityEnabled()); The DAGAppMaster pass the inited state, start correctly and the query is executed successfully. I think it's related to https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java #L2365<%20https:/github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java%20#L2365>: Configuration conf = new Configuration(new YarnConfiguration()); ConfigurationProto confProto = TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name())); TezUtilsInternal.addUserSpecifiedTezConfiguration(conf, confProto.getConfKeyValuesList()); AMPluginDescriptorProto amPluginDescriptorProto = null; if (confProto.hasAmPluginDescriptor()) { amPluginDescriptorProto = confProto.getAmPluginDescriptor(); } UserGroupInformation.setConfiguration(conf); If I understand correctly (I'm not a java programmer) a default configuration is created and then passed to UGI which results in SIMPLE authentication seen has active because it on by default. I don't think it's related to misconfiguration, Kerberos is enabled and working for HDFS/YARN/Hive, doAs is set to false for LLAP. The application logs of failing YARN container: 2019-10-03 12:04:10,410 [DEBUG] [main] |security.UserGroupInformation|: PrivilegedAction as:hive (auth:SIMPLE) from:org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2659) 2019-10-03 12:04:10,411 [DEBUG] [main] |service.AbstractService|: Service: org.apache.tez.dag.app.DAGAppMaster entered state INITED 2019-10-03 12:04:10,430 [INFO] [main] |shim.HadoopShimsLoader|: Trying to locate HadoopShimProvider for hadoopVersion=3.1.1.3.1.0.0-78, majorVersion=3, minorVersion=1 2019-10-03 12:04:10,432 [DEBUG] [main] |shim.HadoopShimsLoader|: Trying HadoopShimProvider : org.apache.tez.hadoop.shim.HadoopShim28Provider 2019-10-03 12:04:10,432 [DEBUG] [main] |shim.HadoopShimsLoader|: Cannot pick org.apache.tez.hadoop.shim.HadoopShim28Provider as the HadoopShimProvider - returned null hadoop shim 2019-10-03 12:04:10,432 [INFO] [main] |shim.HadoopShimsLoader|: Picked HadoopShim org.apache.tez.hadoop.shim.DefaultHadoopShim, providerName=null, overrideProviderViaConfig=null, hadoopVersion=3.1.1.3.1.0.0-78, majorVersion=3, minorVersion=1 2019-10-03 12:04:10,495 [INFO] [main] |app.DAGAppMaster|: AM Level configured TaskSchedulers: [0:TezYarn:null],[1:TezUber:null],[2:LLAP:org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService] 2019-10-03 12:04:10,495 [INFO] [main] |app.DAGAppMaster|: AM Level configured ContainerLaunchers: [0:TezYarn:null],[1:TezUber:null],[2:LLAP:org.apache.hadoop.hive.llap.tezplugins.LlapContainerLauncher] 2019-10-03 12:04:10,495 [INFO] [main] |app.DAGAppMaster|: AM Level configured TaskCommunicators: [0:TezYarn:null],[1:TezUber:null],[2:LLAP:org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator] 2019-10-03 12:04:10,499 [INFO] [main] |app.DAGAppMaster|: Comparing client version with AM version, clientVersion=0.9.1.3.1.0.0-78, AMVersion=0.9.1.3.1.0.0-78 2019-10-03 12:04:10,501 [DEBUG] [main] |utils.Simple2LevelVersionComparator|: Comparing versions version1=0.9, version2=0.9 2019-10-03 12:04:10,505 [DEBUG] [main] |fs.FileSystem|: Looking for FS supporting hdfs 2019-10-03 12:04:10,505 [DEBUG] [main] |fs.FileSystem|: looking for configuration option fs.hdfs.impl 2019-10-03 12:04:10,505 [DEBUG] [main] |fs.FileSystem|: Looking in service filesystems for implementation class 2019-10-03 12:04:10,505 [DEBUG] [main] |fs.FileSystem|: FS for hdfs is class org.apache.hadoop.hdfs.DistributedFileSystem 2019-10-03 12:04:10,551 [DEBUG] [main] |impl.DfsClientConf|: dfs.client.use.legacy.blockreader.local = false 2019-10-03 12:04:10,551 [DEBUG] [main] |impl.DfsClientConf|: dfs.client.read.shortcircuit = true 2019-10-03 12:04:10,551 [DEBUG] [main] |impl.DfsClientConf|: dfs.client.domain.socket.data.traffic = false 2019-10-03 12:04:10,551 [DEBUG] [main] |impl.DfsClientConf|: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket 2019-10-03 12:04:10,595 [DEBUG] [main] |hdfs.DFSClient|: Sets dfs.client.block.write.replace-datanode-on-failure.min-replication to X 2019-10-03 12:04:10,743 [DEBUG] [main] |token.Token|: Cloned private token Kind: HDFS_DELEGATION_TOKEN, Service: 172.23.207.43:8020, Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/host.dom....@dom.tld, renewer=yarn, realUser=, issueDate=1570097039552, maxDate=1570701839552, sequenceNumber=7421, masterKeyId=350) from Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cluster, Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/host.dom....@dom.tld, renewer=yarn, realUser=, issueDate=1570097039552, maxDate=1570701839552, sequenceNumber=7421, masterKeyId=350) 2019-10-03 12:04:10,743 [DEBUG] [main] |hdfs.HAUtilClient|: Mapped HA service delegation token for logical URI hdfs://cluster/tmp/hive/hive/_tez_session_dir/47a0af7c-27a8-408e-99a3-ff1969c42703 to namenode host.dom.tld/ip:port 2019-10-03 12:04:10,743 [DEBUG] [main] |token.Token|: Cloned private token Kind: HDFS_DELEGATION_TOKEN, Service: ip:port, Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/host.dom....@dom.tld, renewer=yarn, realUser=, issueDate=1570097039552, maxDate=1570701839552, sequenceNumber=7421, masterKeyId=350) from Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cluster, Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/host.dom....@dom.tld, renewer=yarn, realUser=, issueDate=1570097039552, maxDate=1570701839552, sequenceNumber=7421, masterKeyId=350) 2019-10-03 12:04:10,744 [DEBUG] [main] |hdfs.HAUtilClient|: Mapped HA service delegation token for logical URI hdfs://cluster/tmp/hive/hive/_tez_session_dir/47a0af7c-27a8-408e-99a3-ff1969c42703 to namenode host.dom.tld/ip:port 2019-10-03 12:04:10,744 [DEBUG] [main] |impl.DfsClientConf|: dfs.client.use.legacy.blockreader.local = false 2019-10-03 12:04:10,744 [DEBUG] [main] |impl.DfsClientConf|: dfs.client.read.shortcircuit = true 2019-10-03 12:04:10,744 [DEBUG] [main] |impl.DfsClientConf|: dfs.client.domain.socket.data.traffic = false 2019-10-03 12:04:10,744 [DEBUG] [main] |impl.DfsClientConf|: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket 2019-10-03 12:04:10,761 [DEBUG] [main] |retry.RetryUtils|: multipleLinearRandomRetry = null 2019-10-03 12:04:10,878 [DEBUG] [main] |ipc.Server|: rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine$RpcProtobufRequest, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@4686afc2 2019-10-03 12:04:10,882 [DEBUG] [main] |ipc.Client|: getting client out of cache: org.apache.hadoop.ipc.Client@45099dd3 2019-10-03 12:04:11,961 [DEBUG] [main] |shortcircuit.DomainSocketFactory|: The short-circuit local reads feature is enabled. 2019-10-03 12:04:11,973 [DEBUG] [main] |sasl.DataTransferSaslUtil|: DataTransferProtocol using SaslPropertiesResolver, configured QOP dfs.data.transfer.protection = authentication,privacy, configured class dfs.data.transfer.saslproperties.resolver.class = class org.apache.hadoop.security.SaslPropertiesResolver 2019-10-03 12:04:12,004 [DEBUG] [main] |ipc.Client|: The ping interval is 60000 ms. 2019-10-03 12:04:12,005 [DEBUG] [main] |ipc.Client|: Connecting to host.dom.tld/ip:port 2019-10-03 12:04:12,022 [DEBUG] [main] |security.UserGroupInformation|: PrivilegedAction as:hive (auth:SIMPLE) from:org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:796) 2019-10-03 12:04:12,154 [DEBUG] [main] |security.SaslRpcClient|: Sending sasl message state: NEGOTIATE 2019-10-03 12:04:12,168 [DEBUG] [main] |endpoint.LlapPluginSecurityInfo|: Trying to get TokenInfo for interface org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB 2019-10-03 12:04:12,171 [DEBUG] [main] |security.SaslRpcClient|: Get token info proto:interface org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB info:@org.apache.hadoop.security.token.TokenInfo(value=class org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSelector) 2019-10-03 12:04:12,179 [DEBUG] [main] |security.SaslRpcClient|: Creating SASL DIGEST-MD5(TOKEN) client to authenticate to service at default 2019-10-03 12:04:12,184 [DEBUG] [main] |security.SaslRpcClient|: Use TOKEN authentication for protocol ClientNamenodeProtocolPB 2019-10-03 12:04:12,189 [DEBUG] [main] |security.SaslRpcClient|: SASL client callback: setting username: xyz 2019-10-03 12:04:12,189 [DEBUG] [main] |security.SaslRpcClient|: SASL client callback: setting userPassword 2019-10-03 12:04:12,189 [DEBUG] [main] |security.SaslRpcClient|: SASL client callback: setting realm: default 2019-10-03 12:04:12,191 [DEBUG] [main] |security.SaslRpcClient|: Sending sasl message state: INITIATE token: "charset=utf-8,username=\"xyz\",realm=\"default\",nonce=\"xyz",nc=00000001,cnonce=\"xyz",digest-uri=\"/default\",maxbuf=65536,response=xyz,qop=auth" auths { method: "TOKEN" mechanism: "DIGEST-MD5" protocol: "" serverId: "default" } 2019-10-03 12:04:12,194 [DEBUG] [main] |ipc.Client|: Negotiated QOP is :auth 2019-10-03 12:04:12,205 [DEBUG] [IPC Client (769530879) connection to host.dom.tld/ip:port from hive] |ipc.Client|: IPC Client (769530879) connection to host.dom.tld/ip:port from hive: starting, having connections 1 2019-10-03 12:04:12,213 [DEBUG] [IPC Parameter Sending Thread #0] |ipc.Client|: IPC Client (769530879) connection to host.dom.tld/ip:port from hive sending #0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo 2019-10-03 12:04:12,214 [DEBUG] [IPC Client (769530879) connection to host.dom.tld/ip:port from hive] |ipc.Client|: IPC Client (769530879) connection to host.dom.tld/ip:port from hive got value #0 2019-10-03 12:04:12,214 [DEBUG] [main] |ipc.ProtobufRpcEngine|: Call: getFileInfo took 231ms 2019-10-03 12:04:12,320 [DEBUG] [IPC Parameter Sending Thread #0] |ipc.Client|: IPC Client (769530879) connection to host.dom.tld/ip:port from hive sending #1 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo 2019-10-03 12:04:12,321 [DEBUG] [IPC Client (769530879) connection to host.dom.tld/ip:port from hive] |ipc.Client|: IPC Client (769530879) connection to host.dom.tld/ip:port from hive got value #1 2019-10-03 12:04:12,321 [DEBUG] [main] |ipc.ProtobufRpcEngine|: Call: getFileInfo took 2ms 2019-10-03 12:04:12,342 [DEBUG] [main] |app.DAGAppMaster|: Stage directory information for AppAttemptId :appattempt_1570094873414_0006_000001 tezSystemStagingDir :hdfs:// cluster/tmp/hive/hive/_tez_session_dir/47a0af7c-27a8-408e-99a3-ff1969c42703/.tez/application_1570094873414_0006 recoveryDataDir :hdfs:// cluster/tmp/hive/hive/_tez_session_dir/47a0af7c-27a8-408e-99a3-ff1969c42703/.tez/application_1570094873414_0006/recovery recoveryAttemptDir :hdfs://cluster/tmp/hive/hive/_tez_session_dir/47a0af7c-27a8-408e-99a3-ff1969c42703/.tez/application_1570094873414_0006/recovery/1 2019-10-03 12:04:12,360 [INFO] [main] |app.TaskCommunicatorManager|: Creating Default Task Communicator 2019-10-03 12:04:12,393 [INFO] [main] |Configuration.deprecation|: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2019-10-03 12:04:12,393 [INFO] [main] |app.TaskCommunicatorManager|: Creating Default Local Task Communicator 2019-10-03 12:04:12,411 [INFO] [main] |app.TaskCommunicatorManager|: Creating TaskCommunicator org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator:{} LLAP 2019-10-03 12:04:12,432 [INFO] [main] |Configuration.deprecation|: yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address 2019-10-03 12:04:12,435 [INFO] [main] |tezplugins.LlapTaskCommunicator|: Task communicator with a token Kind: LLAP_TOKEN, Service: , Ident: (LLAP_TOKEN; LLAP_TOKEN owner=hive/host.dom....@dom.tld, renewer=hive, realUser=, issueDate=1570097036895, maxDate=1571306636895, sequenceNumber=205, masterKeyId=158144, cluster hive_llap0, app ID , signing true) 2019-10-03 12:04:12,435 [DEBUG] [main] |service.AbstractService|: noteFailure {}org.apache.tez.dag.api.TezUncheckedException: java.lang.reflect.InvocationTargetException 2019-10-03 12:04:12,441 [INFO] [main] |service.AbstractService|: Service org.apache.tez.dag.app.DAGAppMaster failed in state INITED org.apache.tez.dag.api.TezUncheckedException: java.lang.reflect.InvocationTargetException at org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:215) at org.apache.tez.dag.app.TaskCommunicatorManager.createTaskCommunicator(TaskCommunicatorManager.java:184) at org.apache.tez.dag.app.TaskCommunicatorManager.<init>(TaskCommunicatorManager.java:152) at org.apache.tez.dag.app.DAGAppMaster.createTaskCommunicatorManager(DAGAppMaster.java:1163) at org.apache.tez.dag.app.DAGAppMaster.serviceInit(DAGAppMaster.java:536) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.tez.dag.app.DAGAppMaster$9.run(DAGAppMaster.java:2662) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2659) at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2464) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:213) ... 11 more Caused by: java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:159) at org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator.<init>(LlapTaskCommunicator.java:165) ... 16 more 2019-10-03 12:04:12,445 [DEBUG] [main] |service.AbstractService|: Service: org.apache.tez.dag.app.DAGAppMaster entered state STOPPED 2019-10-03 12:04:12,454 [DEBUG] [main] |service.AbstractService|: noteFailure {}java.lang.NullPointerException 2019-10-03 12:04:12,454 [WARN] [main] |service.AbstractService|: When stopping the service org.apache.tez.dag.app.DAGAppMaster java.lang.NullPointerException at org.apache.tez.dag.app.DAGAppMaster.initiateStop(DAGAppMaster.java:2179) at org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:2195) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.tez.dag.app.DAGAppMaster$9.run(DAGAppMaster.java:2662) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2659) at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2464) 2019-10-03 12:04:12,455 [ERROR] [main] |app.DAGAppMaster|: Error starting DAGAppMaster org.apache.tez.dag.api.TezUncheckedException: java.lang.reflect.InvocationTargetException at org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:215) at org.apache.tez.dag.app.TaskCommunicatorManager.createTaskCommunicator(TaskCommunicatorManager.java:184) at org.apache.tez.dag.app.TaskCommunicatorManager.<init>(TaskCommunicatorManager.java:152) at org.apache.tez.dag.app.DAGAppMaster.createTaskCommunicatorManager(DAGAppMaster.java:1163) at org.apache.tez.dag.app.DAGAppMaster.serviceInit(DAGAppMaster.java:536) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.tez.dag.app.DAGAppMaster$9.run(DAGAppMaster.java:2662) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2659) at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2464) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:213) ... 11 more Caused by: java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:159) at org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator.<init>(LlapTaskCommunicator.java:165) ... 16 more 2019-10-03 12:04:12,464 [INFO] [shutdown-hook-0] |app.DAGAppMaster|: DAGAppMasterShutdownHook invoked 2019-10-03 12:04:12,465 [DEBUG] [shutdown-hook-0] |app.DAGAppMaster|: DAGAppMaster already stopped. Ignoring signal 2019-10-03 12:04:12,470 [DEBUG] [Thread-3] |util.ShutdownHookManager|: Completed shutdown in 0.006 seconds; Timeouts: 0 2019-10-03 12:04:12,489 [DEBUG] [Thread-3] |util.ShutdownHookManager|: ShutdownHookManger completed shutdown. Regards