Thanks for reporting this issue with deserialization of Configuration objects greater than 32MB. I have filed a JIRA on your behalf https://issues.apache.org/jira/browse/TEZ-4142 which can be used to track this change in the Tez project. I have created a patch request to the 0.10.x and 0.9.x lines with a test case that demonstrates the failure above so that we can make sure this is fixed permanently. As to why this failure occurred, this is related to the hive configuration of split generation. For practical reasons, I recommend a block based approach to split generation, but as to the specifics of what configuration is best in your case, I would reach out to the hive community and their user group. While there is some overlap between our user lists, the hive user list is the best choice for getting a targeted response to this.
This error condition is certainly unique to just now uncover it as billions and billions of other tez jobs have been run and never reported this issue. Thanks again for your time. Jon Eagles Tez PMC Chair On Fri, Apr 10, 2020 at 9:51 AM Rahul Chhiber <rahul.chhi...@6sense.com> wrote: > Hi, > > I'm using hive 2.0.1 with Tez 0.9.1. In a few cases, when I am querying an > orc table, I get the following error - > > Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1586321981777_24335_1_00, > diagnostics=[Vertex vertex_1586321981777_24335_1_00 [Map 1] killed/failed > due to:INIT_FAILURE, Fail to create InputInitializerManager, > org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class > with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator > at > org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:71) > ... > Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol > message was too large. May be malicious. Use > CodedInputStream.setSizeLimit() to increase the size limit. > at > com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) > at > com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) > at com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701) > at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99) > at > org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.<init>(DAGProtos.java:19294) > at > org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.<init>(DAGProtos.java:19258) > at > org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto$1.parsePartialFrom(DAGProtos.java:19360) > at > org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto$1.parsePartialFrom(DAGProtos.java:19355) > at > com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) > at > org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.parseFrom(DAGProtos.java:19552) > at > org.apache.tez.common.TezUtils.createConfFromByteString(TezUtils.java:116) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:92) > > > The table has 32 buckets, and before reading from it I am setting *SET > tez.grouping.split-count = 32 *. I don't understand why the ConfigurationProto > object is growing so large as to exceed the Protobuf limit - can someone > shed some light on this ? And is there some resolution for this other than > modifying our Tez build by explicitly setting > *CodedInputStream.setSizeLimit()* ? > > > Thanks, > Rahul Chhiber >