It appears to be reading a union index and failing in there somehow. If it did not have any of the pig AvroStorage stuff in there I could tell you more.
What does avro-tools.jar 's 'tojson' tool do? (java jar avro-tools-1.6.3.jar tojson <file> | your_favorite_text_reader) What version of Avro is the java stack trace below? On 3/23/12 7:01 PM, "Russell Jurney" <[email protected]> wrote: > I have a problem record I've written in Avro that crashes anything which tries > to read it :( > > Can anyone make sense of these errors? > > The exception in Pig/AvroStorage is this: > >> java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 >> at >> org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:27 >> 5) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader. >> nextKeyValue(PigRecordReader.java:187) >> at >> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask >> .java:532) >> at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) >> at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) >> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) >> at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) >> at >> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) >> at >> org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDa >> tumReader.java:67) >> at >> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) >> at >> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) >> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) >> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) >> at >> org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(Pig >> AvroRecordReader.java:80) >> at >> org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:27 >> 3) >> ... 7 more > > When reading the record in Python: > >> File "/me/Collecting-Data/src/python/cat_avro", line 21, in <module> >> for record in df_reader: >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si >> te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py", line 354, in >> next >> datum = self.datum_reader.read(self.datum_decoder) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si >> te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 445, in read >> return self.read_data(self.writers_schema, self.readers_schema, decoder) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si >> te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 490, in read_data >> return self.read_record(writers_schema, readers_schema, decoder) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si >> te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 690, in >> read_record >> field_val = self.read_data(field.type, readers_field.type, decoder) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si >> te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 488, in read_data >> return self.read_union(writers_schema, readers_schema, decoder) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si >> te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 650, in >> read_union >> raise SchemaResolutionException(fail_msg, writers_schema, readers_schema) >> avro.io.SchemaResolutionException: Can't access branch index 64 for union >> with 2 branches > > When reading the record in Ruby: > >> /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in >> `read_data': Writer's schema and Reader's schema ["string","null"] do not >> match. (Avro::IO::SchemaMatchException) > > -- > Russell Jurney twitter.com/rjurney <http://twitter.com/rjurney> > [email protected] <mailto:[email protected]> datasyndrome.com > <http://datasyndrome.com/>
