Re: Tez job vertex failed due to InvalidProtocolBufferException: Protocol message was too large

Jonathan Eagles Fri, 10 Apr 2020 15:33:18 -0700

Thanks for reporting this issue with deserialization of Configuration
objects greater than 32MB. I have filed a JIRA on your behalf
https://issues.apache.org/jira/browse/TEZ-4142 which can be used to track
this change in the Tez project. I have created a patch request  to the
0.10.x and 0.9.x lines with a test case that demonstrates the failure above
so that we can make sure this is fixed permanently. As to why this failure
occurred, this is related to the hive configuration of split generation.
For practical reasons, I recommend a block based approach to split
generation, but as to the specifics of what configuration is best in your
case, I would reach out to the hive community and their user group. While
there is some overlap between our user lists, the hive user list is the
best choice for getting a targeted response to this.


This error condition is certainly unique to just now uncover it as billions
and billions of other tez jobs have been run and never reported this issue.
Thanks again for your time.

Jon Eagles
Tez PMC Chair

On Fri, Apr 10, 2020 at 9:51 AM Rahul Chhiber <rahul.chhi...@6sense.com>
wrote:

> Hi,
>
> I'm using hive 2.0.1 with Tez 0.9.1. In a few cases, when I am querying an
> orc table, I get the following error -
>
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1586321981777_24335_1_00,
> diagnostics=[Vertex vertex_1586321981777_24335_1_00 [Map 1] killed/failed
> due to:INIT_FAILURE, Fail to create InputInitializerManager,
> org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class
> with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
> at
> org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:71)
> ...
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol
> message was too large.  May be malicious.  Use
> CodedInputStream.setSizeLimit() to increase the size limit.
> at
> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
> at
> com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
> at com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)
> at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)
> at
> org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.<init>(DAGProtos.java:19294)
> at
> org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.<init>(DAGProtos.java:19258)
> at
> org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto$1.parsePartialFrom(DAGProtos.java:19360)
> at
> org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto$1.parsePartialFrom(DAGProtos.java:19355)
> at
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
> at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
> at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
> at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
> at
> org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.parseFrom(DAGProtos.java:19552)
> at
> org.apache.tez.common.TezUtils.createConfFromByteString(TezUtils.java:116)
> at
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:92)
>
>
> The table has 32 buckets, and before reading from it I am setting *SET
> tez.grouping.split-count = 32 *. I don't understand why the ConfigurationProto
> object is growing so large as to exceed the Protobuf limit - can someone
> shed some light on this ? And is there some resolution for this other than
> modifying our Tez build by explicitly setting
> *CodedInputStream.setSizeLimit()* ?
>
>
> Thanks,
> Rahul Chhiber
>

Re: Tez job vertex failed due to InvalidProtocolBufferException: Protocol message was too large

Reply via email to