Schema of a relation reported by DESCRIBE and allowed operations on the 
relation are not compatible

                 Key: PIG-768
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.2.0
            Reporter: George Mavromatis
             Fix For: 0.2.0

The DESCIBE command in the following script  prints:
{s: bytearray, pg: bytearray, wm: bytearray}

However, the script later treats the s field of urlMap as a map instead of a 
bytearray, as shown in s#'Url'.

Pig does not complain about this contradiction and at execution time, the s 
field is treated as hash, although it was reported as byterray at parse time.

Pig should either not report s as a byterray or exit with a parsing error.

Note that all above operations happen before the query executes at the cluster.

register WebDataProcessing.jar; 
register opencrawl.jar; 

urlMap = LOAD '$input' USING opencrawl.pigudf.WebDataLoader() AS (s, pg, wm);


-- in fact the loader in the WebDataProcessing.jar populates s and pg as 
s:map[], pg:bag{t1:(contents:bytearray)}
-- and defines that in determineSchema() but pig describe ignores it!

urlMap2 = LIMIT urlMap 20;

urlList2 = FOREACH urlMap2 GENERATE s#'Url', pg;

DESCRIBE urlList2;

STORE urlList2 INTO 'output2' USING BinStorage();

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to