Hello all, I'm running a LLAP daemon through YARN + ZK. The container for a Hive query begins to execute but there's a class cast error that I don't know how to debug. Here's the logs:
cat syslog_dag_<container_id> --------------------------------------------------- ... 2019-11-11 17:32:02,631 [INFO] [LlapScheduler] |tezplugins.LlapTaskSchedulerService|: Assigned #1, task=TaskInfo{task=attempt_1573233179705_0050_1_00_000000_0, priority=5, startTime=0, containerId=null, uniqueId=0, localityDelayTimeout=0} on node={hostname:43033, id=d84432aa-f08f-467d-8688-c9150430f05e, canAcceptTask=true, st=0, ac=12, commF=false, disabled=false}, to container=container_222212222_0050_01_000001 2019-11-11 17:32:02,631 [INFO] [LlapScheduler] |GuaranteedTasks|: Registering attempt_1573233179705_0050_1_00_000000_0; false 2019-11-11 17:32:02,648 [INFO] [TaskSchedulerAppCallbackExecutor #0] |node.PerSourceNodeTracker|: Adding new node hostname:43033 to nodeTracker 2 2019-11-11 17:32:02,680 [INFO] [Dispatcher thread {Central}] |tezplugins.LlapTaskCommunicator|: CurrentDagId set to: 1, name=select count(device_id) from ...'impression' (Stage-1), queryId=root_20191111173153_2e979533-4d13-4b66-a0a5-fd7d48c07e2f 2019-11-11 17:32:02,680 [INFO] [Dispatcher thread {Central}] |tezplugins.LlapTaskCommunicator|: Added new known node: hostname:43033 2019-11-11 17:32:02,721 [INFO] [Dispatcher thread {Central}] |HistoryEventHandler.criticalEvents|: [HISTORY][DAG:N/A][Event:CONTAINER_LAUNCHED]: containerId=container_222212222_0050_01_000001, launchTime=1573493522721 2019-11-11 17:32:02,722 [INFO] [TaskCommunicator # 0] |impl.LlapProtocolClientImpl|: Creating protocol proxy as null 2019-11-11 17:32:02,722 [INFO] [Dispatcher thread {Central}] |impl.TaskAttemptImpl|: TaskAttempt: [attempt_1573233179705_0050_1_00_000000_0] submitted. Is using containerId: [container_222212222_0050_01_000001] on NM: [hostname:43033] 2019-11-11 17:32:02,723 [INFO] [Dispatcher thread {Central}] |HistoryEventHandler.criticalEvents|: [HISTORY][DAG:dag_1573233179705_0050_1][Event:TASK_ATTEMPT_STARTED]: vertexName=Map 1, taskAttemptId=attempt_1573233179705_0050_1_00_000000_0, startTime=1573493522722, containerId=container_222212222_0050_01_000001, nodeId=hostname:43033 2019-11-11 17:32:02,823 [INFO] [TaskCommunicator # 0] |tezplugins.LlapTaskCommunicator|: Failed to run task: attempt_1573233179705_0050_1_00_000000_0 on containerId: container_222212222_0050_01_000001 org.apache.hadoop.ipc.RemoteException(java.lang.ClassCastException): org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2 cannot be cast to org.apache.hadoop.shaded.com.google.protobuf.BlockingService at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:510) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1545) at org.apache.hadoop.ipc.Client.call(Client.java:1491) at org.apache.hadoop.ipc.Client.call(Client.java:1388) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy50.submitWork(Unknown Source) at org.apache.hadoop.hive.llap.impl.LlapProtocolClientImpl.submitWork(LlapProtocolClientImpl.java:81) at org.apache.hadoop.hive.llap.tez.LlapProtocolClientProxy$SubmitWorkCallable.call(LlapProtocolClientProxy.java:99) at org.apache.hadoop.hive.llap.tez.LlapProtocolClientProxy$SubmitWorkCallable.call(LlapProtocolClientProxy.java:89) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-11-11 17:32:02,828 [INFO] [Dispatcher thread {Central}] |HistoryEventHandler.criticalEvents|: [HISTORY][DAG:dag_1573233179705_0050_1][Event:TASK_ATTEMPT_FINISHED]: vertexName=Map 1, taskAttemptId=attempt_1573233179705_0050_1_00_000000_0, creationTime=1573493522614, allocationTime=1573493522672, startTime=1573493522722, finishTime=1573493522826, timeTaken=104, status=FAILED, taskFailureType=NON_FATAL, errorEnum=UNKNOWN_ERROR, diagnostics=org.apache.hadoop.ipc.RemoteException(java.lang.ClassCastException): org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2 cannot be cast to org.apache.hadoop.shaded.com.google.protobuf.BlockingService at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:510) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915) , nodeHttpAddress=http://hostname:15002, counters=Counters: 1, org.apache.tez.common.counters.DAGCounter, DATA_LOCAL_TASKS=1 2019-11-11 17:32:02,832 [INFO] [Dispatcher thread {Central}] |impl.TaskImpl|: Scheduling new attempt for task: task_1573233179705_0050_1_00_000000, currentFailedAttempts: 1, maxFailedAttempts: 4 ... --------------------------------------------------- After which it fails on the 4th attempt. Is this a jar version mismatch or protobuffers mismatch or classpath error or...? Let me know what other information I should provide. Any help is much appreciated! Software versions are: Hadoop 3.2.1 Tez 0.9.2 Hive 3.1.2