Hi,
I'm trying to run a simple AvroStorage example to read from a tsv file via
PigStorage and write to Avro, but the job fails with the following
exception:
java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be
cast to org.apache.avro.generic.IndexedRecord
at
org.apache.avro.generic.GenericData.getField(GenericData.java:470)
at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:102)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
at
org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
at
org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:244)
at
org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:580)
The sample script is taken from this wiki section:
http://linkedin.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data#AvroStorage-PigsupportforAvrodata-A.Howtostoredataindifferentways
.
I'm using the pig trunk and Avro 1.6.0.
Has anyone encountered this or know what the issue is? It seems like this
use case isn't supported in the current version of AvroStorage, so it's
either a bug in the code or the documentation. The unit tests only include
tests to verify that avro data read via AvroStorage could then produce avro,
but there is no test to go from PigStorage to AvroStorage.
thanks,
Bill