Hello, It looks like representing avro strings as Utf8 provide some interesting performance enhancements, but I'm wondering if folks out there are actually using it in practice, or have had any issues with it.
We have recently run into an issue where our avro files which represents strings as "avro.java.string" are causing ClassCastExceptions because Pig and Hive are expecting them to be Utf8. The exceptions occur when using avro-1.7.x.jar, but dissapear when using version avro-1.5.3.jar. I'm wondering if this is something that should be addressed in the avro jar, or in pig and hive like this thread suggests: https://issues.apache.org/jira/browse/PIG-3297 Here are the exceptions we are seeing: *Hive:* Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.avro.util.Utf8 at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeMap(AvroDeserializer.java:253) *Pig:* Caused by: java.io.IOException: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.avro.util.Utf8 at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:275) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:194) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) Thanks. -Mike
