Santhosh Srinivasan commented on PIG-767:

As a reference, please look at PIG-449

The tuple inside a bag cannot be accessed by name or position. There are no 
semantics that support accessing a tuple inside a bag. However, the contents of 
the tuple inside a bag are accessible. As such, the presence or absence of a 
tuple inside a bag (as part of the schema) in the describe output does not 

E.g.: urlContentsG: {group: chararray,urlContents: {t1:(url: chararray,pg: 
In the above schema, you can access urlContents.url. You will not be able to 
access urlContents.t1

An example to illustrate this point follows:

grunt> a = load 'input' as (bagColumn: bag{t: tuple(i: int, f: float)});
grunt> b = foreach a generate bagColumn.t;
2009-04-16 13:23:43,324 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1028: Access to the tuple (t) of the bag is disallowed. Only access to the 
elements of the tuple in the bag is allowed.

Details at logfile: 

grunt> c = foreach a generate bagColumn.i;


> Schema reported from DESCRIBE and actual schema of inner bags are different.
> ----------------------------------------------------------------------------
>                 Key: PIG-767
>                 URL: https://issues.apache.org/jira/browse/PIG-767
>             Project: Pig
>          Issue Type: Bug
>            Reporter: George Mavromatis
>             Fix For: 0.2.0
> The following script:
> urlContents = LOAD 'inputdir' USING BinStorage() AS (url:bytearray, 
> pg:bytearray);
> -- describe and dump are in-sync
> DESCRIBE urlContents;
> DUMP urlContents;
> urlContentsG = GROUP urlContents BY url;
> DESCRIBE urlContentsG;
> urlContentsF = FOREACH urlContentsG GENERATE group,urlContents.pg;
> DESCRIBE urlContentsF;
> DUMP urlContentsF;
> Prints for the DESCRIBE commands:
> urlContents: {url: chararray,pg: chararray}
> urlContentsG: {group: chararray,urlContents: {url: chararray,pg: chararray}}
> urlContentsF: {group: chararray,pg: {pg: chararray}}
> The reported schemas for urlContentsG and urlContentsF are wrong. They are 
> also against the section "Schemas for Complex Data Types" in 
> http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_Schemas.
> As expected, actual data observed from DUMP urlContentsG and DUMP 
> urlContentsF do contain the tuple inside the inner bags.
> The correct schema for urlContentsG is:  {group: chararray,urlContents: 
> {t1:(url: chararray,pg: chararray)}}
> This may sound like a technicality, but it isn't. For instance, a UDF that 
> assumes an inner bag of {chararray} will not work with {(chararray)}. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to