Ah, I guess the CHANGES.txt file aren't just changes for the current branch.

-Kim

On Sun, Jun 19, 2011 at 2:15 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:

> The jira is for pig 0.2 ... it's been that way for a while. The whole
> twoLevelAccess thing was removed in 9 plus, though, I am wondering if
> that changes how we need to declare schemas, or if perhaps it wasn't
> removed all the way.
>
> On Sat, Jun 18, 2011 at 9:32 AM, Kim Vogt <k...@simplegeo.com> wrote:
> > This jira appears to be relevant
> > https://issues.apache.org/jira/browse/PIG-449
> >
> > -Kim
> >
> > On Sat, Jun 18, 2011 at 9:25 AM, Kim Vogt <k...@simplegeo.com> wrote:
> >
> >> To clarify, I get no errors when I run with the old TOBAG, the one in
> the
> >> gist without the outputSchema.
> >>
> >> -Kim
> >>
> >>
> >> On Sat, Jun 18, 2011 at 9:23 AM, Kim Vogt <k...@simplegeo.com> wrote:
> >>
> >>> I think it has something to do with the TOBAG udf (
> >>> https://gist.github.com/1033242). When I run:
> >>>
> >>> a = load 'x.txt' as (a:long, b:long);
> >>> -- dump a;
> >>>
> >>> x = foreach a generate a, TOBAG(a, b) as abag;
> >>> y = foreach x generate TOTUPLE(a, abag) as atuple;
> >>> z = foreach y generate atuple.abag;
> >>> describe z;
> >>> dump z;
> >>>
> >>> I get no errors.
> >>>
> >>> grunt> describe z
> >>>
> >>> z: {abag: {null}}
> >>>
> >>> grunt> dump z
> >>>
> >>> 2011-06-18 09:18:34,106 [main] INFO
> >>>  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input
> >>> paths to process : 1
> >>>
> >>> ({})
> >>>
> >>> ({})
> >>>
> >>> ({})
> >>>
> >>> Hope that helps!
> >>>
> >>> -Kim
> >>>
> >>> On Sat, Jun 18, 2011 at 8:05 AM, Gianmarco <gianmarco....@gmail.com
> >wrote:
> >>>
> >>>> Hi Dmitriy,
> >>>>
> >>>> Unfortunately I don't have a solution but I can report something
> related.
> >>>> I don't know exaclty why there was a change between 0.8 and 0.9, but I
> >>>> also
> >>>> noticed that now complex schemas are handled differently.
> >>>> For example:
> >>>>
> >>>> In 0.8 I would use:
> >>>> raw = LOAD '$input' AS (username:chararray,
> >>>> topics:bag{t(topic:chararray)},
> >>>> links:bag{t(link:chararray)});
> >>>>
> >>>> But in 0.9 this doesn't work and I need to stript that extra 't' from
> the
> >>>> bags.
> >>>> raw = LOAD '$input' AS (username:chararray,
> >>>> topics:bag{(topic:chararray)},
> >>>> links:bag{(link:chararray)});
> >>>>
> >>>> I assume this is the same problem and the change actually was
> introduced
> >>>> in
> >>>> 0.8.1
> >>>>
> >>>> Cheers,
> >>>> --
> >>>> Gianmarco De Francisci Morales
> >>>>
> >>>>
> >>>> On Sat, Jun 18, 2011 at 16:42, Dmitriy Ryaboy <dvrya...@gmail.com>
> >>>> wrote:
> >>>>
> >>>> > Hi folks,
> >>>> > We've migrated to pig 0.8.1 and everything went pretty smoothly
> except
> >>>> > for one oddity involving how we generate schemas for complex Thrift
> >>>> > structures; namely, it seems like we get into trouble now when our
> >>>> > tuple contains lists.
> >>>> >
> >>>> > The gory details are in
> >>>> > https://github.com/kevinweil/elephant-bird/issues/60 but here's a
> >>>> > summary. Any help, or pointers to relevant Jiras, would be much
> >>>> > appreciated.
> >>>> >
> >>>> > in 8.1, reading that kind of structure seems to be broken
> altogether;
> >>>> > this fails:
> >>>> >
> >>>> > a = load 'x.txt' as (a:long, b:long);
> >>>> > -- dump a;
> >>>> >
> >>>> > x = foreach a generate a, TOBAG(a, b) as abag;
> >>>> > y = foreach x generate TOTUPLE(a, abag) as atuple;
> >>>> > z = foreach y generate atuple.abag;
> >>>> > describe z;
> >>>> > dump z;
> >>>> >
> >>>> > In trunk, the above snippet works, but loading a relation with a
> >>>> > schema we generate for the following thrift definition does not work
> >>>> > (0.6 and 0.8 don't complain):
> >>>> > struct LogEvent {
> >>>> > 1: optional EventDetails event_details
> >>>> > }
> >>>> >
> >>>> > struct EventDetails {
> >>>> > 1: optional list item_ids
> >>>> > }
> >>>> >
> >>>> > The error:
> >>>> > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR
> >>>> > 2218: Invalid resource schema: bag schema must have tuple as its
> field
> >>>> > at
> >>>> >
> >>>>
> org.apache.pig.ResourceSchema$ResourceFieldSchema.throwInvalidSchemaException(ResourceSchema.java:213)
> >>>> > at
> >>>> >
> >>>>
> org.apache.pig.impl.logicalLayer.schema.Schema.getPigSchema(Schema.java:1881)
> >>>> > at
> >>>> >
> >>>>
> org.apache.pig.impl.logicalLayer.schema.Schema.getPigSchema(Schema.java:1871)
> >>>> > at
> >>>> >
> >>>>
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)
> >>>> >
> >>>> > And some of Raghu's comments:
> >>>> > There is some stripping going on in pig trunk. Added following log
> >>>> > line to Schema.java (line 1986 in 0.8 and 1880 in trunk) :
> >>>> > log.info("XXX : bag schema : " + rfs + " inner : " + innerFs);
> >>>> >
> >>>> > With same EB jar:
> >>>> >
> >>>> > On trunk :
> >>>> > bag schema : item_ids:{item_ids_tuple:long} inner : item_ids_tuple:
> >>>> long
> >>>> > followed by error shown above.
> >>>> >
> >>>> > On 0.8 :
> >>>> > bag schema : item_ids:{t:(item_ids_tuple:long)} inner : t:
> >>>> > tuple({item_ids_tuple: long})
> >>>> > bag schema : item_names:{t:(item_names_tuple:chararray)} inner : t:
> >>>> > tuple({item_names_tuple: chararray})
> >>>> > bag schema : tokens:{t:(tokens_tuple:chararray)} inner : t:
> >>>> > tuple({tokens_tuple: chararray})
> >>>> >
> >>>> > The tuple wrapping gets stripped in Pig 10.
> >>>> >
> >>>> > For trunk, adding an extra Tuple wrapper fixes it. ie, a bag of
> longs
> >>>> > looks like :
> >>>> >
> >>>> > new Schema(
> >>>> >      new FieldSchema(  "bag",
> >>>> >           new Schema (
> >>>> >                new FieldSchema(  "t",  // Extra layer
> >>>> >                        new Schema (    // Extra layer
> >>>> >                               new FieldSchema( "bag_tuple", null,
> >>>> > DataType.LONG )
> >>>> >                        ), DataType.TUPLE  // Extra layer
> >>>> >                 )
> >>>> >         ), DataType.BAG
> >>>> >  )
> >>>> > But this does not seem compatible with pig 0.8.
> >>>> >
> >>>>
> >>>
> >>>
> >>
> >
>

Reply via email to