Strange AVRO 1.7.4 exception

Yong Zhang Thu, 21 Jan 2016 07:54:10 -0800
Hi, AVRO gurus:
I am facing a strange case in the AVRO right now, and hope someone here can at 
least give me some hints or a direction debugging it.
Our production cluster is CENTOS 6.5 (2.6.32-358.14.1.el6.x86_64), running IBM 
BigInsight V3.0.0.2. In Apache term, it is Hadoop 2.2.0 with MRV1(no yarn), and 
comes with AVRO 1.7.4, running with IBM J9 VM (build 2.7, JRE 1.7.0 Linux 
amd64-64 Compressed References 20140515_199835 (JIT enabled, AOT enabled). Not 
sure if the JDK matters, but it is NOT Oracle JVM.
We have a ETL implemented in a chain of MR jobs. In one MR job, it is going to 
merge 2 sets of AVRO data. Dataset1 is in HDFS location A, and Dataset2 is in 
HDFS location B, and both contains the AVRO records binding to the same AVRO 
schema. The record contains an unique id field, and a timestamp field. The MR 
job is to merge the records based on the ID, and use the later timestamp record 
to replace previous timestamp record, and omit the final AVRO record out. Very 
straightforward.
Now we faced a problem that one reducer keeps failing with the following 
stacktrace on JobTracker:
java.lang.IndexOutOfBoundsException
        at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:191)
        at java.io.DataInputStream.read(DataInputStream.java:160)
        at 
org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184)
        at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:263)
        at 
org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
        at 
org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:348)
        at 
org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:143)
        at 
org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:125)
        at 
org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:121)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
        at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
        at 
org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:108)
        at 
org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:48)
        at 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142)
        at 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:117)
        at 
org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:297)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:165)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at 
java.security.AccessController.doPrivileged(AccessController.java:366)
        at javax.security.auth.Subject.doAs(Subject.java:572)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
Here is the my Mapper and Reducer methods:Mapper:public void 
map(AvroKey<SpecificRecord> key, NullWritable value, Context context) throws 
IOException, InterruptedException Reducer:protected void 
reduce(CustomPartitionKeyClass key, Iterable<AvroValue<SpecificRecord>> values, 
Context context) throws IOException, InterruptedException 
What bother me are the following facts:1) All the mappers finish without 
error2) Most of the reducers finish without error, but one reducer keeps 
failing with the above error.3) It looks like caused by the data? But keep in 
mind that all the avro records passed the mapper side, but failed in one 
reducer. 4) From the stacktrace, it looks like our reducer code was NOT invoked 
yet, but failed before that. So my guess is that all the AVRO records pass 
through the mapper side, but AVRO complains the intermediate result generated 
by the one mapper? In my understanding, that will be a Sequence file generated 
by Hadoop, and value part will be the AVRO bytes. Is the above error meaning 
that AVRO cannot deserialize the value part from the sequence file?5) Our ETL 
run fine for more than one year, but suddenly got this error starting from one 
day, and kept getting this problem after that. 6) If it helps, here is the 
schema for the avro record:{
    "namespace" : "company name",
    "type" : "record",
    "name" : "Lists",
    "fields" : [
        {"name" : "account_id", "type" : "long"},
        {"name" : "list_id", "type" : "string"},
        {"name" : "sequence_id", "type" : ["int", "null"]} ,
        {"name" : "name", "type" : ["string", "null"]},
        {"name" : "state", "type" : ["string", "null"]},
        {"name" : "description", "type" : ["string", "null"]},
        {"name" : "dynamic_filtered_list", "type" : ["int", "null"]},
        {"name" : "filter_criteria", "type" : ["string", "null"]},
        {"name" : "created_at", "type" : ["long", "null"]},
        {"name" : "updated_at", "type" : ["long", "null"]},
        {"name" : "deleted_at", "type" : ["long", "null"]},
        {"name" : "favorite", "type" : ["int", "null"]},
        {"name" : "delta", "type" : ["boolean", "null"]},
        {
            "name" : "list_memberships", "type" : {
                "type" : "array", "items" : {
                    "name" : "ListMembership", "type" : "record",
                    "fields" : [
                        {"name" : "channel_id", "type" : "string"},
                        {"name" : "created_at", "type" : ["long", "null"]},
                        {"name" : "created_source", "type" : ["string", 
"null"]},
                        {"name" : "deleted_at", "type" : ["long", "null"]},
                        {"name" : "sequence_id", "type" : ["long", "null"]}
                    ]
                }
            }
        }
    ]
}
What I am looking for is that someone can help me understand what could cause 
this problem in the reducer side? From all the facts, it points to the data. As 
whole thing works for more than one year, but suddenly cannot merge the new 
coming data. But what kind of data cause the above error, or how should I debug 
this issue? The data generated as AVRO records just look fine and read in the 
mapper side without any issue, but some data cause this failure in the reducer 
side.
Thanks for your help.
Yong
Strange AVRO 1.7.4 exception

Reply via email to