Hi Cheolsoo,

Thanks for your reply!  (Liang and I work together.)  The restriction to
"simple" union types is still there in the latest code; see lines 83-95,
here:
https://github.com/apache/pig/blob/trunk/src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java

I know that elephant-bird has full schema support for reading/writing
protobuf (please correct me if I'm wrong).  However, I am unaware of any
alternative to AvroStorage for pig.  We may just need to implement it
ourselves, since it is a blocker for using pig with our avro schemas.
 Perhaps, someone on this list knows another workaround?

Thanks,

stan


On Wed, Mar 26, 2014 at 2:17 PM, Cheolsoo Park <[email protected]> wrote:

> Hi Liang,
>
> Does the new builtin AvroStorage work for you? I don't use Avro myself, so
> I cannot test it out. But it looks like that restriction is removed in the
> new AvroStorage. Here is the relevant code-
>
>
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java#L132
>
> Thanks,
> Cheolsoo
>
>
> On Tue, Mar 25, 2014 at 9:17 AM, Liliang Li <[email protected]> wrote:
>
> > Hi:
> >
> > I have a record of union type of
> >
> > union {TypeA, TypeB, TypeC, TypeD, TypeE} mydata;
> >
> > I have the serialized data in avro format, however when I am trying to
> use
> > piggybank.jar's AvroStorage function to load the avro data, it gives me
> the
> > following error:
> >
> > Caused by: java.io.IOException: We don't accept schema containing
> > generic unions.
> >     at
> >
> org.apache.pig.piggybank.storage.avro.AvroSchema2Pig.convert(AvroSchema2Pig.java:54)
> >     at
> >
> org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:384)
> >     at
> >
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:174)
> >     ... 23 more
> >
> > So, after reading the piggybank source code here
> >
> >
> https://github.com/triplel/pig/blob/branch-0.12/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
> >
> >     /** determine whether a union is a nullable union;
> >     * note that this function doesn't check containing
> >     * types of the input union recursively. */
> >     public static boolean isAcceptableUnion(Schema in) {
> >         if (! in.getType().equals(Schema.Type.UNION))
> >            return false;
> >
> >     List<Schema> types = in.getTypes();
> >     if (types.size() <= 1) {
> >         return true;
> >     } else if (types.size() > 2) {
> >         return false; /*contains more than 2 types */
> >     } else {
> >         /* one of two types is NULL */
> >         return types.get(0).getType().equals(Schema.Type.NULL) ||
> > types.get(1) .getType().equals(Schema.Type.NULL);
> >     }
> > }
> >
> > basically piggybank's AvroStorage uses a function
> isAcceptableUnion(Schema
> > in) which does not support more than 2 union types.
> >
> > My question is:
> >
> > Does anyone know any work around to read avro document with arbitrary
> union
> > types in PIG?
> >
> >
> > Any comments will be greatly appreciated.
> >
> >
> > Liang
> >
>

Reply via email to