Hi Cheolsoo, Thanks for your reply! (Liang and I work together.) The restriction to "simple" union types is still there in the latest code; see lines 83-95, here: https://github.com/apache/pig/blob/trunk/src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java
I know that elephant-bird has full schema support for reading/writing protobuf (please correct me if I'm wrong). However, I am unaware of any alternative to AvroStorage for pig. We may just need to implement it ourselves, since it is a blocker for using pig with our avro schemas. Perhaps, someone on this list knows another workaround? Thanks, stan On Wed, Mar 26, 2014 at 2:17 PM, Cheolsoo Park <[email protected]> wrote: > Hi Liang, > > Does the new builtin AvroStorage work for you? I don't use Avro myself, so > I cannot test it out. But it looks like that restriction is removed in the > new AvroStorage. Here is the relevant code- > > > https://github.com/apache/pig/blob/trunk/src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java#L132 > > Thanks, > Cheolsoo > > > On Tue, Mar 25, 2014 at 9:17 AM, Liliang Li <[email protected]> wrote: > > > Hi: > > > > I have a record of union type of > > > > union {TypeA, TypeB, TypeC, TypeD, TypeE} mydata; > > > > I have the serialized data in avro format, however when I am trying to > use > > piggybank.jar's AvroStorage function to load the avro data, it gives me > the > > following error: > > > > Caused by: java.io.IOException: We don't accept schema containing > > generic unions. > > at > > > org.apache.pig.piggybank.storage.avro.AvroSchema2Pig.convert(AvroSchema2Pig.java:54) > > at > > > org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:384) > > at > > > org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:174) > > ... 23 more > > > > So, after reading the piggybank source code here > > > > > https://github.com/triplel/pig/blob/branch-0.12/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java > > > > /** determine whether a union is a nullable union; > > * note that this function doesn't check containing > > * types of the input union recursively. */ > > public static boolean isAcceptableUnion(Schema in) { > > if (! in.getType().equals(Schema.Type.UNION)) > > return false; > > > > List<Schema> types = in.getTypes(); > > if (types.size() <= 1) { > > return true; > > } else if (types.size() > 2) { > > return false; /*contains more than 2 types */ > > } else { > > /* one of two types is NULL */ > > return types.get(0).getType().equals(Schema.Type.NULL) || > > types.get(1) .getType().equals(Schema.Type.NULL); > > } > > } > > > > basically piggybank's AvroStorage uses a function > isAcceptableUnion(Schema > > in) which does not support more than 2 union types. > > > > My question is: > > > > Does anyone know any work around to read avro document with arbitrary > union > > types in PIG? > > > > > > Any comments will be greatly appreciated. > > > > > > Liang > > >
