Hi all,

I'm currently running LLAP on kerberized HDP cluster (v 3.1.0-78), somehow when 
I run this query:

use default;
CREATE TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, destination String)
COMMENT 'Employee details'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
INSERT INTO employee VALUES (1, 'toto', 'toto', 'toto'), (2, 'titi', 'titi', 
'titi');
select * from employee;

It fails on the INSERT statement,  the DAGAppMaster seems in an unconfigured 
context. Because in LlapTaskCommunicator.java, this check fails line 165 on 
kerberized cluster:
Preconditions.checkState((token != null) == 
UserGroupInformation.isSecurityEnabled());

In fact I logged UserGroupInformation.isSecurityEnabled and it returns False, 
so if I add :
UserGroupInformation.setConfiguration(conf);
Preconditions.checkState((token != null) == 
UserGroupInformation.isSecurityEnabled());

The DAGAppMaster pass the inited state, start correctly and the query is 
executed successfully.

I think it's related to  
https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java
 
#L2365<%20https:/github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java%20#L2365>:
      Configuration conf = new Configuration(new YarnConfiguration());

      ConfigurationProto confProto =
          
TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name()));
      TezUtilsInternal.addUserSpecifiedTezConfiguration(conf, 
confProto.getConfKeyValuesList());

      AMPluginDescriptorProto amPluginDescriptorProto = null;
      if (confProto.hasAmPluginDescriptor()) {
        amPluginDescriptorProto = confProto.getAmPluginDescriptor();
      }

      UserGroupInformation.setConfiguration(conf);

If I understand correctly (I'm not a java programmer) a  default configuration 
is created and then passed to UGI which results in SIMPLE authentication seen 
has active because it on by default.
I don't think it's related to misconfiguration, Kerberos is enabled and working 
for HDFS/YARN/Hive, doAs is set to false for LLAP.

The application logs of failing YARN container:

2019-10-03 12:04:10,410 [DEBUG] [main] |security.UserGroupInformation|: 
PrivilegedAction as:hive (auth:SIMPLE) 
from:org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2659)
2019-10-03 12:04:10,411 [DEBUG] [main] |service.AbstractService|: Service: 
org.apache.tez.dag.app.DAGAppMaster entered state INITED
2019-10-03 12:04:10,430 [INFO] [main] |shim.HadoopShimsLoader|: Trying to 
locate HadoopShimProvider for hadoopVersion=3.1.1.3.1.0.0-78, majorVersion=3, 
minorVersion=1
2019-10-03 12:04:10,432 [DEBUG] [main] |shim.HadoopShimsLoader|: Trying 
HadoopShimProvider : org.apache.tez.hadoop.shim.HadoopShim28Provider
2019-10-03 12:04:10,432 [DEBUG] [main] |shim.HadoopShimsLoader|: Cannot pick 
org.apache.tez.hadoop.shim.HadoopShim28Provider as the HadoopShimProvider - 
returned null hadoop shim
2019-10-03 12:04:10,432 [INFO] [main] |shim.HadoopShimsLoader|: Picked 
HadoopShim org.apache.tez.hadoop.shim.DefaultHadoopShim, providerName=null, 
overrideProviderViaConfig=null, hadoopVersion=3.1.1.3.1.0.0-78, majorVersion=3, 
minorVersion=1
2019-10-03 12:04:10,495 [INFO] [main] |app.DAGAppMaster|: AM Level configured 
TaskSchedulers: 
[0:TezYarn:null],[1:TezUber:null],[2:LLAP:org.apache.hadoop.hive.llap.tezplugins.LlapTaskSchedulerService]
2019-10-03 12:04:10,495 [INFO] [main] |app.DAGAppMaster|: AM Level configured 
ContainerLaunchers: 
[0:TezYarn:null],[1:TezUber:null],[2:LLAP:org.apache.hadoop.hive.llap.tezplugins.LlapContainerLauncher]
2019-10-03 12:04:10,495 [INFO] [main] |app.DAGAppMaster|: AM Level configured 
TaskCommunicators: 
[0:TezYarn:null],[1:TezUber:null],[2:LLAP:org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator]
2019-10-03 12:04:10,499 [INFO] [main] |app.DAGAppMaster|: Comparing client 
version with AM version, clientVersion=0.9.1.3.1.0.0-78, 
AMVersion=0.9.1.3.1.0.0-78
2019-10-03 12:04:10,501 [DEBUG] [main] |utils.Simple2LevelVersionComparator|: 
Comparing versions version1=0.9, version2=0.9
2019-10-03 12:04:10,505 [DEBUG] [main] |fs.FileSystem|: Looking for FS 
supporting hdfs
2019-10-03 12:04:10,505 [DEBUG] [main] |fs.FileSystem|: looking for 
configuration option fs.hdfs.impl
2019-10-03 12:04:10,505 [DEBUG] [main] |fs.FileSystem|: Looking in service 
filesystems for implementation class
2019-10-03 12:04:10,505 [DEBUG] [main] |fs.FileSystem|: FS for hdfs is class 
org.apache.hadoop.hdfs.DistributedFileSystem
2019-10-03 12:04:10,551 [DEBUG] [main] |impl.DfsClientConf|: 
dfs.client.use.legacy.blockreader.local = false
2019-10-03 12:04:10,551 [DEBUG] [main] |impl.DfsClientConf|: 
dfs.client.read.shortcircuit = true
2019-10-03 12:04:10,551 [DEBUG] [main] |impl.DfsClientConf|: 
dfs.client.domain.socket.data.traffic = false
2019-10-03 12:04:10,551 [DEBUG] [main] |impl.DfsClientConf|: 
dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket
2019-10-03 12:04:10,595 [DEBUG] [main] |hdfs.DFSClient|: Sets 
dfs.client.block.write.replace-datanode-on-failure.min-replication to X
2019-10-03 12:04:10,743 [DEBUG] [main] |token.Token|: Cloned private token 
Kind: HDFS_DELEGATION_TOKEN, Service: 172.23.207.43:8020, Ident: (token for 
hive: HDFS_DELEGATION_TOKEN owner=hive/host.dom....@dom.tld, renewer=yarn, 
realUser=, issueDate=1570097039552, maxDate=1570701839552, sequenceNumber=7421, 
masterKeyId=350) from Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cluster, 
Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/host.dom....@dom.tld, 
renewer=yarn, realUser=, issueDate=1570097039552, maxDate=1570701839552, 
sequenceNumber=7421, masterKeyId=350)
2019-10-03 12:04:10,743 [DEBUG] [main] |hdfs.HAUtilClient|: Mapped HA service 
delegation token for logical URI 
hdfs://cluster/tmp/hive/hive/_tez_session_dir/47a0af7c-27a8-408e-99a3-ff1969c42703
 to namenode host.dom.tld/ip:port
2019-10-03 12:04:10,743 [DEBUG] [main] |token.Token|: Cloned private token 
Kind: HDFS_DELEGATION_TOKEN, Service: ip:port, Ident: (token for hive: 
HDFS_DELEGATION_TOKEN owner=hive/host.dom....@dom.tld, renewer=yarn, realUser=, 
issueDate=1570097039552, maxDate=1570701839552, sequenceNumber=7421, 
masterKeyId=350) from Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cluster, 
Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/host.dom....@dom.tld, 
renewer=yarn, realUser=, issueDate=1570097039552, maxDate=1570701839552, 
sequenceNumber=7421, masterKeyId=350)
2019-10-03 12:04:10,744 [DEBUG] [main] |hdfs.HAUtilClient|: Mapped HA service 
delegation token for logical URI 
hdfs://cluster/tmp/hive/hive/_tez_session_dir/47a0af7c-27a8-408e-99a3-ff1969c42703
 to namenode host.dom.tld/ip:port
2019-10-03 12:04:10,744 [DEBUG] [main] |impl.DfsClientConf|: 
dfs.client.use.legacy.blockreader.local = false
2019-10-03 12:04:10,744 [DEBUG] [main] |impl.DfsClientConf|: 
dfs.client.read.shortcircuit = true
2019-10-03 12:04:10,744 [DEBUG] [main] |impl.DfsClientConf|: 
dfs.client.domain.socket.data.traffic = false
2019-10-03 12:04:10,744 [DEBUG] [main] |impl.DfsClientConf|: 
dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket
2019-10-03 12:04:10,761 [DEBUG] [main] |retry.RetryUtils|: 
multipleLinearRandomRetry = null
2019-10-03 12:04:10,878 [DEBUG] [main] |ipc.Server|: 
rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class 
org.apache.hadoop.ipc.ProtobufRpcEngine$RpcProtobufRequest, 
rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@4686afc2
2019-10-03 12:04:10,882 [DEBUG] [main] |ipc.Client|: getting client out of 
cache: org.apache.hadoop.ipc.Client@45099dd3
2019-10-03 12:04:11,961 [DEBUG] [main] |shortcircuit.DomainSocketFactory|: The 
short-circuit local reads feature is enabled.
2019-10-03 12:04:11,973 [DEBUG] [main] |sasl.DataTransferSaslUtil|: 
DataTransferProtocol using SaslPropertiesResolver, configured QOP 
dfs.data.transfer.protection = authentication,privacy, configured class 
dfs.data.transfer.saslproperties.resolver.class = class 
org.apache.hadoop.security.SaslPropertiesResolver
2019-10-03 12:04:12,004 [DEBUG] [main] |ipc.Client|: The ping interval is 60000 
ms.
2019-10-03 12:04:12,005 [DEBUG] [main] |ipc.Client|: Connecting to 
host.dom.tld/ip:port
2019-10-03 12:04:12,022 [DEBUG] [main] |security.UserGroupInformation|: 
PrivilegedAction as:hive (auth:SIMPLE) 
from:org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:796)
2019-10-03 12:04:12,154 [DEBUG] [main] |security.SaslRpcClient|: Sending sasl 
message state: NEGOTIATE

2019-10-03 12:04:12,168 [DEBUG] [main] |endpoint.LlapPluginSecurityInfo|: 
Trying to get TokenInfo for interface 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB
2019-10-03 12:04:12,171 [DEBUG] [main] |security.SaslRpcClient|: Get token info 
proto:interface org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB 
info:@org.apache.hadoop.security.token.TokenInfo(value=class 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSelector)
2019-10-03 12:04:12,179 [DEBUG] [main] |security.SaslRpcClient|: Creating SASL 
DIGEST-MD5(TOKEN)  client to authenticate to service at default
2019-10-03 12:04:12,184 [DEBUG] [main] |security.SaslRpcClient|: Use TOKEN 
authentication for protocol ClientNamenodeProtocolPB
2019-10-03 12:04:12,189 [DEBUG] [main] |security.SaslRpcClient|: SASL client 
callback: setting username: xyz
2019-10-03 12:04:12,189 [DEBUG] [main] |security.SaslRpcClient|: SASL client 
callback: setting userPassword
2019-10-03 12:04:12,189 [DEBUG] [main] |security.SaslRpcClient|: SASL client 
callback: setting realm: default
2019-10-03 12:04:12,191 [DEBUG] [main] |security.SaslRpcClient|: Sending sasl 
message state: INITIATE
token: 
"charset=utf-8,username=\"xyz\",realm=\"default\",nonce=\"xyz",nc=00000001,cnonce=\"xyz",digest-uri=\"/default\",maxbuf=65536,response=xyz,qop=auth"
auths {
  method: "TOKEN"
  mechanism: "DIGEST-MD5"
  protocol: ""
  serverId: "default"
}

2019-10-03 12:04:12,194 [DEBUG] [main] |ipc.Client|: Negotiated QOP is :auth
2019-10-03 12:04:12,205 [DEBUG] [IPC Client (769530879) connection to 
host.dom.tld/ip:port from hive] |ipc.Client|: IPC Client (769530879) connection 
to host.dom.tld/ip:port from hive: starting, having connections 1
2019-10-03 12:04:12,213 [DEBUG] [IPC Parameter Sending Thread #0] |ipc.Client|: 
IPC Client (769530879) connection to host.dom.tld/ip:port from hive sending #0 
org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo
2019-10-03 12:04:12,214 [DEBUG] [IPC Client (769530879) connection to 
host.dom.tld/ip:port from hive] |ipc.Client|: IPC Client (769530879) connection 
to host.dom.tld/ip:port from hive got value #0
2019-10-03 12:04:12,214 [DEBUG] [main] |ipc.ProtobufRpcEngine|: Call: 
getFileInfo took 231ms
2019-10-03 12:04:12,320 [DEBUG] [IPC Parameter Sending Thread #0] |ipc.Client|: 
IPC Client (769530879) connection to host.dom.tld/ip:port from hive sending #1 
org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo
2019-10-03 12:04:12,321 [DEBUG] [IPC Client (769530879) connection to 
host.dom.tld/ip:port from hive] |ipc.Client|: IPC Client (769530879) connection 
to host.dom.tld/ip:port from hive got value #1
2019-10-03 12:04:12,321 [DEBUG] [main] |ipc.ProtobufRpcEngine|: Call: 
getFileInfo took 2ms
2019-10-03 12:04:12,342 [DEBUG] [main] |app.DAGAppMaster|: Stage directory 
information for AppAttemptId :appattempt_1570094873414_0006_000001 
tezSystemStagingDir :hdfs:// 
cluster/tmp/hive/hive/_tez_session_dir/47a0af7c-27a8-408e-99a3-ff1969c42703/.tez/application_1570094873414_0006
 recoveryDataDir :hdfs:// 
cluster/tmp/hive/hive/_tez_session_dir/47a0af7c-27a8-408e-99a3-ff1969c42703/.tez/application_1570094873414_0006/recovery
 recoveryAttemptDir 
:hdfs://cluster/tmp/hive/hive/_tez_session_dir/47a0af7c-27a8-408e-99a3-ff1969c42703/.tez/application_1570094873414_0006/recovery/1
2019-10-03 12:04:12,360 [INFO] [main] |app.TaskCommunicatorManager|: Creating 
Default Task Communicator
2019-10-03 12:04:12,393 [INFO] [main] |Configuration.deprecation|: 
yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, 
use yarn.system-metrics-publisher.enabled
2019-10-03 12:04:12,393 [INFO] [main] |app.TaskCommunicatorManager|: Creating 
Default Local Task Communicator
2019-10-03 12:04:12,411 [INFO] [main] |app.TaskCommunicatorManager|: Creating 
TaskCommunicator org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator:{} 
LLAP
2019-10-03 12:04:12,432 [INFO] [main] |Configuration.deprecation|: 
yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address
2019-10-03 12:04:12,435 [INFO] [main] |tezplugins.LlapTaskCommunicator|: Task 
communicator with a token Kind: LLAP_TOKEN, Service: , Ident: (LLAP_TOKEN; 
LLAP_TOKEN owner=hive/host.dom....@dom.tld, renewer=hive, realUser=, 
issueDate=1570097036895, maxDate=1571306636895, sequenceNumber=205, 
masterKeyId=158144, cluster hive_llap0, app ID , signing true)
2019-10-03 12:04:12,435 [DEBUG] [main] |service.AbstractService|: noteFailure 
{}org.apache.tez.dag.api.TezUncheckedException: 
java.lang.reflect.InvocationTargetException
2019-10-03 12:04:12,441 [INFO] [main] |service.AbstractService|: Service 
org.apache.tez.dag.app.DAGAppMaster failed in state INITED
org.apache.tez.dag.api.TezUncheckedException: 
java.lang.reflect.InvocationTargetException
        at 
org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:215)
        at 
org.apache.tez.dag.app.TaskCommunicatorManager.createTaskCommunicator(TaskCommunicatorManager.java:184)
        at 
org.apache.tez.dag.app.TaskCommunicatorManager.<init>(TaskCommunicatorManager.java:152)
        at 
org.apache.tez.dag.app.DAGAppMaster.createTaskCommunicatorManager(DAGAppMaster.java:1163)
        at 
org.apache.tez.dag.app.DAGAppMaster.serviceInit(DAGAppMaster.java:536)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.tez.dag.app.DAGAppMaster$9.run(DAGAppMaster.java:2662)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at 
org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2659)
        at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2464)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:213)
        ... 11 more
Caused by: java.lang.IllegalStateException
        at 
com.google.common.base.Preconditions.checkState(Preconditions.java:159)
        at 
org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator.<init>(LlapTaskCommunicator.java:165)
        ... 16 more
2019-10-03 12:04:12,445 [DEBUG] [main] |service.AbstractService|: Service: 
org.apache.tez.dag.app.DAGAppMaster entered state STOPPED
2019-10-03 12:04:12,454 [DEBUG] [main] |service.AbstractService|: noteFailure 
{}java.lang.NullPointerException
2019-10-03 12:04:12,454 [WARN] [main] |service.AbstractService|: When stopping 
the service org.apache.tez.dag.app.DAGAppMaster
java.lang.NullPointerException
        at 
org.apache.tez.dag.app.DAGAppMaster.initiateStop(DAGAppMaster.java:2179)
        at 
org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:2195)
        at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
        at 
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
        at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
        at org.apache.tez.dag.app.DAGAppMaster$9.run(DAGAppMaster.java:2662)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at 
org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2659)
        at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2464)
2019-10-03 12:04:12,455 [ERROR] [main] |app.DAGAppMaster|: Error starting 
DAGAppMaster
org.apache.tez.dag.api.TezUncheckedException: 
java.lang.reflect.InvocationTargetException
        at 
org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:215)
        at 
org.apache.tez.dag.app.TaskCommunicatorManager.createTaskCommunicator(TaskCommunicatorManager.java:184)
        at 
org.apache.tez.dag.app.TaskCommunicatorManager.<init>(TaskCommunicatorManager.java:152)
        at 
org.apache.tez.dag.app.DAGAppMaster.createTaskCommunicatorManager(DAGAppMaster.java:1163)
        at 
org.apache.tez.dag.app.DAGAppMaster.serviceInit(DAGAppMaster.java:536)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.tez.dag.app.DAGAppMaster$9.run(DAGAppMaster.java:2662)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at 
org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2659)
        at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2464)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.tez.dag.app.TaskCommunicatorManager.createCustomTaskCommunicator(TaskCommunicatorManager.java:213)
        ... 11 more
Caused by: java.lang.IllegalStateException
        at 
com.google.common.base.Preconditions.checkState(Preconditions.java:159)
        at 
org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator.<init>(LlapTaskCommunicator.java:165)
       ... 16 more
2019-10-03 12:04:12,464 [INFO] [shutdown-hook-0] |app.DAGAppMaster|: 
DAGAppMasterShutdownHook invoked
2019-10-03 12:04:12,465 [DEBUG] [shutdown-hook-0] |app.DAGAppMaster|: 
DAGAppMaster already stopped. Ignoring signal
2019-10-03 12:04:12,470 [DEBUG] [Thread-3] |util.ShutdownHookManager|: 
Completed shutdown in 0.006 seconds; Timeouts: 0
2019-10-03 12:04:12,489 [DEBUG] [Thread-3] |util.ShutdownHookManager|: 
ShutdownHookManger completed shutdown.

Regards

Reply via email to