Santhosh Srinivasan commented on PIG-449:

Currently, bags in Pig are containers of tuples. Accessing elements inside a 
bag should translate to accessing elements inside the tuple contained in the 
bag. In addition, accessing tuples inside a bag should be restricted to the 
FLATTEN keyword in a FOREACH statement. A few examples shown below will 
demonstrate the point.

a = load '/user/pig/data/student.data' using PigStorage(' ') as (name, age, 
b = foreach a generate {(16, 4.0e-2, 'hello')} as b:{t:(i: int, d: double, c: 
c = foreach b generate b.i; -- Here b.i should generate a bag of integers by 
accessing the column called 'i' inside each tuple
d = foeach b generate b.t; -- This should be outlawed as the tuple inside the 
bag does not have a column called 't' although the tuple inside the bag are 
named 't'


1. The frontend should translate access to columns in a bag to columns inside 
the tuple in the bag
2. The frontend should prevent access to tuples inside the bag via projections 
and allow access only via the FLATTEN keyword

Thoughts/suggestions/comments are welcome.

> Schemas for bags should contain tuples all the time
> ---------------------------------------------------
>                 Key: PIG-449
>                 URL: https://issues.apache.org/jira/browse/PIG-449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
> The front end treats relations as operators that return bags.  When the 
> schema of a load statement is specified, the bag is associated with the 
> schema specified by the user. Ideally, the schema corresponds to the tuple 
> contained in the bag. 
> With PIG-380, the schema for bag constants are computed by the front end. The 
> schema for the bag contains the tuple which in turn contains the schema of 
> the columns. This results in errors when columns are accessed directly just 
> like the load statements.
> The front end should then treat access to the columns as a double 
> dereference, i.e., access the tuple inside the bag and then the column inside 
> the tuple.
> {code}
> grunt> a = load '/user/sms/data/student.data' using PigStorage(' ') as (name, 
> age, gpa);
> grunt> b = foreach a generate {(16, 4.0e-2, 'hello')} as b:{t:(i: int, d: 
> double, c: chararray)};
> grunt> describe b;
> b: {b: {t: (i: integer,d: double,c: chararray)}}
> grunt> c = foreach b generate b.i;
> 111064 [main] ERROR org.apache.pig.tools.grunt.GruntParser  - 
> java.io.IOException: Invalid alias: i in {t: (i: integer,d: double,c: 
> chararray)}
>         at org.apache.pig.PigServer.parseQuery(PigServer.java:293)
>         at org.apache.pig.PigServer.registerQuery(PigServer.java:258)
>         at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:432)
>         at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:242)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:93)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58)
>         at org.apache.pig.Main.main(Main.java:282)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid 
> alias: i in {t: (i: integer,d: double,c: chararray)}
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:5851)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:5709)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BracketedSimpleProj(QueryParser.java:5242)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:4040)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:3909)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:3863)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:3772)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:3698)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:3664)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:3590)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:3500)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:3457)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:2933)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:2336)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:973)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:748)
>         at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:549)
>         at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60)
>         at org.apache.pig.PigServer.parseQuery(PigServer.java:290)
>         ... 6 more
> 111064 [main] ERROR org.apache.pig.tools.grunt.GruntParser  - Invalid alias: 
> i in {t: (i: integer,d: double,c: chararray)}
> 111064 [main] ERROR org.apache.pig.tools.grunt.GruntParser  - 
> java.io.IOException: Invalid alias: i in {t: (i: integer,d: double,c: 
> chararray)}
> grunt> c = foreach b generate b.t;
> grunt> describe c;
> c: {t: {i: integer,d: double,c: chararray}}
> {code}

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to