Hi:
I have a record of union type of
union {TypeA, TypeB, TypeC, TypeD, TypeE} mydata;
I have the serialized data in avro format, however when I am trying to use
piggybank.jar's AvroStorage function to load the avro data, it gives me the
following error:
Caused by: java.io.IOException: We don't accept schema containing
generic unions.
at
org.apache.pig.piggybank.storage.avro.AvroSchema2Pig.convert(AvroSchema2Pig.java:54)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:384)
at
org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:174)
... 23 more
So, after reading the piggybank source code here
https://github.com/triplel/pig/blob/branch-0.12/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
/** determine whether a union is a nullable union;
* note that this function doesn't check containing
* types of the input union recursively. */
public static boolean isAcceptableUnion(Schema in) {
if (! in.getType().equals(Schema.Type.UNION))
return false;
List<Schema> types = in.getTypes();
if (types.size() <= 1) {
return true;
} else if (types.size() > 2) {
return false; /*contains more than 2 types */
} else {
/* one of two types is NULL */
return types.get(0).getType().equals(Schema.Type.NULL) ||
types.get(1) .getType().equals(Schema.Type.NULL);
}
}
basically piggybank's AvroStorage uses a function isAcceptableUnion(Schema
in) which does not support more than 2 union types.
My question is:
Does anyone know any work around to read avro document with arbitrary union
types in PIG?
Any comments will be greatly appreciated.
Liang