Some ideas: A String is encoded as a Long length, followed by that number of bytes in Utf8. An empty string is therefore encoded as the number 0L -- which is one byte, 0x00. It appears that it is trying to skip a string or Long, but it is the end of the byte[].
So either it is expecting a Long or String to skip, and there is nothing there. Perhaps the empty String was not encoded as an empty string, but skipped. Perhaps a Long count or other number (What is the Schema being compared?) WordCount is often key = word, val = count, and so it would need to read the string word, and skip the long count. If either of these is left out and not written, I would expect the sort of error below. I hope that helps, -Scott On 9/1/11 5:42 AM, "Friso van Vollenhoven" <[email protected]> wrote: >Hi All, > >I am working on a modified version of the Avro MapReduce support to make >it play nice with the new Hadoop API (0.20.2). Most of the code if >borrowed from the Avro mapred package, but I decided not to fully >abstract away the Mapper and Reducer classes (like Avro does now using >HadoopMapper and HadoopReducer classes). All else is much the same as the >mapred implementation. > >When testing, I ran into a issues when emitting empty strings (empty >Utf8) from the mapper as key. I get the following: >org.apache.avro.AvroRuntimeException: java.io.EOFException > at org.apache.avro.io.BinaryData.compare(BinaryData.java:74) > at org.apache.avro.io.BinaryData.compare(BinaryData.java:60) > at >org.apache.avro.mapreduce.AvroKeyComparator.compare(AvroKeyComparator.java >:45) <== this is my own code > at >org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java: >120) > at >org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414) > at >org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:256) >Caused by: java.io.EOFException > at org.apache.avro.io.BinaryDecoder.readLong(BinaryDecoder.java:182) > at >org.apache.avro.generic.GenericDatumReader.skip(GenericDatumReader.java:38 >9) > at org.apache.avro.io.BinaryData.compare(BinaryData.java:86) > at org.apache.avro.io.BinaryData.compare(BinaryData.java:72) > ... 8 more > > >The root cause stack trace is as follows (taken from debugger, breakpoint >on the throw new EOFException(); line): >Thread [Thread-11] (Suspended (breakpoint at line 182 in BinaryDecoder)) > BinaryDecoder.readLong() line: 182 > GenericDatumReader<D>.skip(Schema, Decoder) line: 389 > BinaryData.compare(BinaryData$Decoders, Schema) line: 86 > BinaryData.compare(byte[], int, int, byte[], int, int, Schema) line: 72 > BinaryData.compare(byte[], int, byte[], int, Schema) line: 60 > AvroKeyComparator<T>.compare(byte[], int, int, byte[], int, int) line: >45 > > Reducer$Context(ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT>).nextKeyValu >e() line: 120 > Reducer$Context(ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT>).nextKey() >line: 92 > > AvroMapReduceTest$WordCountingAvroReducer(Reducer<KEYIN,VALUEIN,KEYOUT,VA >LUEOUT>).run(Reducer<KEYIN,VALUEIN,KEYOUT,Contex>) line: 175 > ReduceTask.runNewReducer(JobConf, TaskUmbilicalProtocol, TaskReporter, >RawKeyValueIterator, RawComparator<INKEY>, Class<INKEY>, Class<INVALUE>) >line: 572 > ReduceTask.run(JobConf, TaskUmbilicalProtocol) line: 414 > LocalJobRunner$Job.run() line: 256 > >I went through the decoding code to see where this comes from, but I >can't immediately spot where it goes wrong. I am guessing the actual >problem is earlier during execution where it possibly increases pos too >often. > >Has anyone experienced this? I can live without emitting empty keys from >MR jobs, but I ran into this implementing a word count job on a text file >with empty lines (counting those could be a valid use case). I am using >Avro 1.5.2. > >Thanks for any clues. > > >Cheers, >Friso >
