Hi. We fixed the issue by patching protobuf-java-2.5.0.jar, we changed CodedInputStream.DEFAULT_SIZE_LIMIT to 1GB. Uploaded the patched version on our servers and added the location of the aforementioned jar to the *tez.cluster.additional.classpath.prefix* (tez-site.xml) to /path/to/patched/protobuf-java-2.5.0.jar:<original contents>. Please note that it should be the first jar on the *tez.cluster.additional.classpath.prefix*. Apparently, Tez was using the default 64MB protobuf message limit.
BTW, latest protobuf version was set to Integer.MAX_VALUE. See https://github.com/protocolbuffers/protobuf/blob/v3.11.3/java/core/src/main/java/com/google/protobuf/CodedInputStream.java#L62-L65 . Regards, Bernard On Mon, Feb 10, 2020 at 8:23 PM Bernard Quizon < bernard.qui...@cheetahdigital.com> wrote: > Hi. > > We're using Hive 3.0.1 and we're currently experiencing this issue: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *Error while processing statement: FAILED: Execution Error, return code 2 > from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, > vertexName=Map 1, vertexId=vertex_1581309524541_0094_14_00, > diagnostics=[Vertex vertex_1581309524541_0094_14_00 [Map 1] killed/failed > due to:INIT_FAILURE, Fail to create InputInitializerManager, > org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class > with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGeneratorat > org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:71)at > org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)at > org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:152)at > org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)at > java.security.AccessController.doPrivileged(Native Method)at > javax.security.auth.Subject.doAs(Subject.java:422)at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)at > org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)at > org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)at > org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4122)at > org.apache.tez.dag.app.dag.impl.VertexImpl.access$3100(VertexImpl.java:207)at > org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2932)at > org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2879)at > org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2861)at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)at > org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)at > org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1957)at > org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:206)at > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2317)at > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2303)at > org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)at > org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)at > java.lang.Thread.run(Thread.java:745)Caused by: > java.lang.reflect.InvocationTargetExceptionat > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at > java.lang.reflect.Constructor.newInstance(Constructor.java:423)at > org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)... > 25 moreCaused by: com.google.protobuf.InvalidProtocolBufferException: > Protocol message was too large. May be malicious. Use > CodedInputStream.setSizeLimit() to increase the size limit.at > com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)at > com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)at > com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)at > com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)at > org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.<init>(DAGProtos.java:19294)at > org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.<init>(DAGProtos.java:19258)at > org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto$1.parsePartialFrom(DAGProtos.java:19360)at > org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto$1.parsePartialFrom(DAGProtos.java:19355)at > com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)at > com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)at > com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)at > com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)at > org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.parseFrom(DAGProtos.java:19552)at > org.apache.tez.common.TezUtils.createConfFromByteString(TezUtils.java:116)at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:130)... > 30 more]Vertex killed, vertexName=Reducer 2, > vertexId=vertex_1581309524541_0094_14_01, diagnostics=[Vertex received Kill > in NEW state., Vertex vertex_1581309524541_0094_14_01 [Reducer 2] > killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to > VERTEX_FAILURE. failedVertices:1 killedVertices:1* > > Our tables are in ORC format and we probably have 1k Hive table in that > cluster. > I see these issues below but I don't think any of them are related to what > we're experiencing. > https://issues.apache.org/jira/browse/HIVE-11592 > https://issues.apache.org/jira/browse/HIVE-11268 > > I see that the *sizeLimit* for CodedInputStream is set to > *PROTOBUF_MESSAGE_MAX_LIMIT* = *1073741824*. > This issue just popped out a few days ago. Is there any workaround for > this? And would guys know how we can lower the protobuf message size per > request? > > BTW, we've tried deleting the contents of TAB_COL_STATS but it didn't help. > > Thanks, > Bernard >