Pradeep Kamath updated PIG-449:

    Assignee: Pradeep Kamath  (was: Santhosh Srinivasan)
      Status: Patch Available  (was: Open)

A new flag has been introduced in schema to distinguish bag schemas which have 
only one field schema of a tuple containing a list of field schemas for the 
elements in the bag (these kind of bag schemas occur in two cases explained in 
the code comment below). This flag is will be used to solve the problems 
reported in this issue by resolving access to fields in such bags as access to 
the fields present in the inner tuple schema. This is explained in the comment 
for this flag pasted here for reference:
    // In bags which have a schema with a tuple which contains
    // the fields present in it, if we access the second field (say)
    // we are actually trying to access the second field in the
    // tuple in the bag. This is currently true for two cases:
    // 1) bag constants - the schema of bag constant has a tuple
    // which internally has the actual elements
    // 2) When bags are loaded from input data, if the user 
    // specifies a schema with the "bag" type, he has to specify
    // the bag as containing a tuple with the actual elements in 
    // the schema declaration. However in both the cases above,
    // the user can still say b.i where b is the bag and i is 
    // an element in the bag's tuple schema. So in these cases,
    // the access should translate to a lookup for "i" in the 
    // tuple schema present in the bag. To indicate this, the
    // flag below is used. It is false by default because, 
    // currently we use bag as the type for relations. However 
    // the schema of a relation does NOT have a tuple fieldschema
    // with items in it. Instead, the schema directly has the 
    // field schema of the items. So for a relation "b", the 
    // above b.i access would be a direct single level access
    // of i in b's schema. This is treated as the "default" case
    private boolean twoLevelAccessRequired = false;

The changes are in getPosition() in Schema.java to use the above flag to do a 
two level access whenever an access to the above kind of bag is involved. 
Besides this there are changes in getSchema() of LOForEach and getFieldSchema() 
of LOProject to use the inner tuple schema in cases of these kinds of bags. A 
new unit test case, TestDataBagAccess has also been added to test out various 
access scenarios for the above cases of bag schemas which have a tuple field 
schema with a list of item field schemas.

> Schemas for bags should contain tuples all the time
> ---------------------------------------------------
>                 Key: PIG-449
>                 URL: https://issues.apache.org/jira/browse/PIG-449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>         Attachments: PIG-449.patch
> The front end treats relations as operators that return bags.  When the 
> schema of a load statement is specified, the bag is associated with the 
> schema specified by the user. Ideally, the schema corresponds to the tuple 
> contained in the bag. 
> With PIG-380, the schema for bag constants are computed by the front end. The 
> schema for the bag contains the tuple which in turn contains the schema of 
> the columns. This results in errors when columns are accessed directly just 
> like the load statements.
> The front end should then treat access to the columns as a double 
> dereference, i.e., access the tuple inside the bag and then the column inside 
> the tuple.
> {code}
> grunt> a = load '/user/sms/data/student.data' using PigStorage(' ') as (name, 
> age, gpa);
> grunt> b = foreach a generate {(16, 4.0e-2, 'hello')} as b:{t:(i: int, d: 
> double, c: chararray)};
> grunt> describe b;
> b: {b: {t: (i: integer,d: double,c: chararray)}}
> grunt> c = foreach b generate b.i;
> 111064 [main] ERROR org.apache.pig.tools.grunt.GruntParser  - 
> java.io.IOException: Invalid alias: i in {t: (i: integer,d: double,c: 
> chararray)}
>         at org.apache.pig.PigServer.parseQuery(PigServer.java:293)
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:258)
>         at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:432)
>         at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:242)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:93)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58)
>         at org.apache.pig.Main.main(Main.java:282)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid 
> alias: i in {t: (i: integer,d: double,c: chararray)}
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:5851)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:5709)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BracketedSimpleProj(QueryParser.java:5242)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:4040)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:3909)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:3863)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:3772)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:3698)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:3664)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:3590)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:3500)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:3457)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:2933)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:2336)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:973)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:748)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:549)
>         at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60)
>         at org.apache.pig.PigServer.parseQuery(PigServer.java:290)
>         ... 6 more
> 111064 [main] ERROR org.apache.pig.tools.grunt.GruntParser  - Invalid alias: 
> i in {t: (i: integer,d: double,c: chararray)}
> 111064 [main] ERROR org.apache.pig.tools.grunt.GruntParser  - 
> java.io.IOException: Invalid alias: i in {t: (i: integer,d: double,c: 
> chararray)}
> grunt> c = foreach b generate b.t;
> grunt> describe c;
> c: {t: {i: integer,d: double,c: chararray}}
> {code}

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to