Ah, I guess the CHANGES.txt file aren't just changes for the current branch.
-Kim On Sun, Jun 19, 2011 at 2:15 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > The jira is for pig 0.2 ... it's been that way for a while. The whole > twoLevelAccess thing was removed in 9 plus, though, I am wondering if > that changes how we need to declare schemas, or if perhaps it wasn't > removed all the way. > > On Sat, Jun 18, 2011 at 9:32 AM, Kim Vogt <k...@simplegeo.com> wrote: > > This jira appears to be relevant > > https://issues.apache.org/jira/browse/PIG-449 > > > > -Kim > > > > On Sat, Jun 18, 2011 at 9:25 AM, Kim Vogt <k...@simplegeo.com> wrote: > > > >> To clarify, I get no errors when I run with the old TOBAG, the one in > the > >> gist without the outputSchema. > >> > >> -Kim > >> > >> > >> On Sat, Jun 18, 2011 at 9:23 AM, Kim Vogt <k...@simplegeo.com> wrote: > >> > >>> I think it has something to do with the TOBAG udf ( > >>> https://gist.github.com/1033242). When I run: > >>> > >>> a = load 'x.txt' as (a:long, b:long); > >>> -- dump a; > >>> > >>> x = foreach a generate a, TOBAG(a, b) as abag; > >>> y = foreach x generate TOTUPLE(a, abag) as atuple; > >>> z = foreach y generate atuple.abag; > >>> describe z; > >>> dump z; > >>> > >>> I get no errors. > >>> > >>> grunt> describe z > >>> > >>> z: {abag: {null}} > >>> > >>> grunt> dump z > >>> > >>> 2011-06-18 09:18:34,106 [main] INFO > >>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total > input > >>> paths to process : 1 > >>> > >>> ({}) > >>> > >>> ({}) > >>> > >>> ({}) > >>> > >>> Hope that helps! > >>> > >>> -Kim > >>> > >>> On Sat, Jun 18, 2011 at 8:05 AM, Gianmarco <gianmarco....@gmail.com > >wrote: > >>> > >>>> Hi Dmitriy, > >>>> > >>>> Unfortunately I don't have a solution but I can report something > related. > >>>> I don't know exaclty why there was a change between 0.8 and 0.9, but I > >>>> also > >>>> noticed that now complex schemas are handled differently. > >>>> For example: > >>>> > >>>> In 0.8 I would use: > >>>> raw = LOAD '$input' AS (username:chararray, > >>>> topics:bag{t(topic:chararray)}, > >>>> links:bag{t(link:chararray)}); > >>>> > >>>> But in 0.9 this doesn't work and I need to stript that extra 't' from > the > >>>> bags. > >>>> raw = LOAD '$input' AS (username:chararray, > >>>> topics:bag{(topic:chararray)}, > >>>> links:bag{(link:chararray)}); > >>>> > >>>> I assume this is the same problem and the change actually was > introduced > >>>> in > >>>> 0.8.1 > >>>> > >>>> Cheers, > >>>> -- > >>>> Gianmarco De Francisci Morales > >>>> > >>>> > >>>> On Sat, Jun 18, 2011 at 16:42, Dmitriy Ryaboy <dvrya...@gmail.com> > >>>> wrote: > >>>> > >>>> > Hi folks, > >>>> > We've migrated to pig 0.8.1 and everything went pretty smoothly > except > >>>> > for one oddity involving how we generate schemas for complex Thrift > >>>> > structures; namely, it seems like we get into trouble now when our > >>>> > tuple contains lists. > >>>> > > >>>> > The gory details are in > >>>> > https://github.com/kevinweil/elephant-bird/issues/60 but here's a > >>>> > summary. Any help, or pointers to relevant Jiras, would be much > >>>> > appreciated. > >>>> > > >>>> > in 8.1, reading that kind of structure seems to be broken > altogether; > >>>> > this fails: > >>>> > > >>>> > a = load 'x.txt' as (a:long, b:long); > >>>> > -- dump a; > >>>> > > >>>> > x = foreach a generate a, TOBAG(a, b) as abag; > >>>> > y = foreach x generate TOTUPLE(a, abag) as atuple; > >>>> > z = foreach y generate atuple.abag; > >>>> > describe z; > >>>> > dump z; > >>>> > > >>>> > In trunk, the above snippet works, but loading a relation with a > >>>> > schema we generate for the following thrift definition does not work > >>>> > (0.6 and 0.8 don't complain): > >>>> > struct LogEvent { > >>>> > 1: optional EventDetails event_details > >>>> > } > >>>> > > >>>> > struct EventDetails { > >>>> > 1: optional list item_ids > >>>> > } > >>>> > > >>>> > The error: > >>>> > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR > >>>> > 2218: Invalid resource schema: bag schema must have tuple as its > field > >>>> > at > >>>> > > >>>> > org.apache.pig.ResourceSchema$ResourceFieldSchema.throwInvalidSchemaException(ResourceSchema.java:213) > >>>> > at > >>>> > > >>>> > org.apache.pig.impl.logicalLayer.schema.Schema.getPigSchema(Schema.java:1881) > >>>> > at > >>>> > > >>>> > org.apache.pig.impl.logicalLayer.schema.Schema.getPigSchema(Schema.java:1871) > >>>> > at > >>>> > > >>>> > org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151) > >>>> > > >>>> > And some of Raghu's comments: > >>>> > There is some stripping going on in pig trunk. Added following log > >>>> > line to Schema.java (line 1986 in 0.8 and 1880 in trunk) : > >>>> > log.info("XXX : bag schema : " + rfs + " inner : " + innerFs); > >>>> > > >>>> > With same EB jar: > >>>> > > >>>> > On trunk : > >>>> > bag schema : item_ids:{item_ids_tuple:long} inner : item_ids_tuple: > >>>> long > >>>> > followed by error shown above. > >>>> > > >>>> > On 0.8 : > >>>> > bag schema : item_ids:{t:(item_ids_tuple:long)} inner : t: > >>>> > tuple({item_ids_tuple: long}) > >>>> > bag schema : item_names:{t:(item_names_tuple:chararray)} inner : t: > >>>> > tuple({item_names_tuple: chararray}) > >>>> > bag schema : tokens:{t:(tokens_tuple:chararray)} inner : t: > >>>> > tuple({tokens_tuple: chararray}) > >>>> > > >>>> > The tuple wrapping gets stripped in Pig 10. > >>>> > > >>>> > For trunk, adding an extra Tuple wrapper fixes it. ie, a bag of > longs > >>>> > looks like : > >>>> > > >>>> > new Schema( > >>>> > new FieldSchema( "bag", > >>>> > new Schema ( > >>>> > new FieldSchema( "t", // Extra layer > >>>> > new Schema ( // Extra layer > >>>> > new FieldSchema( "bag_tuple", null, > >>>> > DataType.LONG ) > >>>> > ), DataType.TUPLE // Extra layer > >>>> > ) > >>>> > ), DataType.BAG > >>>> > ) > >>>> > But this does not seem compatible with pig 0.8. > >>>> > > >>>> > >>> > >>> > >> > > >