Hi Ravi,

I believe this is handled in https://issues.apache.org/jira/browse/PIG-3492 .
If I remember correctly, in 0.10, describe would show incorrect schema but 
running still worked since the inconsistency got fixed in the optimization 
phase.

Koji


On Feb 10, 2014, at 5:01 PM, "Kodre,Ravi" <[email protected]> wrote:

> I am seeing a bug in the pig behavior. So I went ahead and created a sample 
> dataset to share the bug details because I cannot share the original data and 
> script.
> 
> This is my sample input file
> 
> Data
> 
> John|Gary|42
> 
> Pig Script
> 
> data = LOAD 'data' USING PigStorage('|') AS (parent:chararray, 
> child:chararray, edge_id:chararray);
> 
> data1 = FOREACH data GENERATE parent AS node1, child AS node2, edge_id;
> 
> data2 = FOREACH data GENERATE child AS node1, parent AS node2, edge_id;
> 
> data3 = UNION data1, data2;
> 
> data4 = FOREACH data3 GENERATE node1, node2;
> 
> DESCRIBE data4;
> 
> $pig -x local bug.pig
> 
> 2014-02-10 13:55:31,201 [main] INFO  org.apache.pig.Main - Apache Pig version 
> 0.10.0-cdh3u4a (rexported) compiled Sep 04 2012, 14:03:46
> 2014-02-10 13:55:31,201 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /x/home/abc/pig_1392069331197.log
> 2014-02-10 13:55:31,452 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: file:///<file:////>
> data4: {node2: chararray,node2: chararray}
> 
> 
> I should be getting node1 and node2 in my schema but I am getting node2 
> twice. Can anyone tell me what I am doing wrong here ?
> 
> Thanks,
> Ravi.
> 
> (Paypal Data Scientist)

Reply via email to