Schema reported from DESCRIBE and actual schema of inner bags are different.

                 Key: PIG-767
             Project: Pig
          Issue Type: Bug
            Reporter: George Mavromatis
             Fix For: 0.2.0

The following script:

urlContents = LOAD 'inputdir' USING BinStorage() AS (url:bytearray, 
-- describe and dump are in-sync
DESCRIBE urlContents;
DUMP urlContents;

urlContentsG = GROUP urlContents BY url;
DESCRIBE urlContentsG;

urlContentsF = FOREACH urlContentsG GENERATE group,;

DESCRIBE urlContentsF;
DUMP urlContentsF;

Prints for the DESCRIBE commands:

urlContents: {url: chararray,pg: chararray}
urlContentsG: {group: chararray,urlContents: {url: chararray,pg: chararray}}
urlContentsF: {group: chararray,pg: {pg: chararray}}

The reported schemas for urlContentsG and urlContentsF are wrong. They are also 
against the section "Schemas for Complex Data Types" in

As expected, actual data observed from DUMP urlContentsG and DUMP urlContentsF 
do contain the tuple inside the inner bags.

The correct schema for urlContentsG is:  {group: chararray,urlContents: 
{t1:(url: chararray,pg: chararray)}}

This may sound like a technicality, but it isn't. For instance, a UDF that 
assumes an inner bag of {chararray} will not work with {(chararray)}. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to