I have a problem record I've written in Avro that crashes anything which tries to read it :(
Can anyone make sense of these errors? The exception in Pig/AvroStorage is this: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 > at > org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:275) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) > at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) > at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) > at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > at > org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:67) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) > at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) > at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) > at > org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:80) > at > org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:273) > ... 7 more When reading the record in Python: File "/me/Collecting-Data/src/python/cat_avro", line 21, in <module> > for record in df_reader: > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py", > line 354, in next > datum = self.datum_reader.read(self.datum_decoder) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", > line 445, in read > return self.read_data(self.writers_schema, self.readers_schema, > decoder) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", > line 490, in read_data > return self.read_record(writers_schema, readers_schema, decoder) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", > line 690, in read_record > field_val = self.read_data(field.type, readers_field.type, decoder) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", > line 488, in read_data > return self.read_union(writers_schema, readers_schema, decoder) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", > line 650, in read_union > raise SchemaResolutionException(fail_msg, writers_schema, > readers_schema) > avro.io.SchemaResolutionException: Can't access branch index 64 for union > with 2 branches When reading the record in Ruby: /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in > `read_data': Writer's schema and Reader's schema ["string","null"] do not > match. (Avro::IO::SchemaMatchException) -- Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
