Hi Ravi, I believe this is handled in https://issues.apache.org/jira/browse/PIG-3492 . If I remember correctly, in 0.10, describe would show incorrect schema but running still worked since the inconsistency got fixed in the optimization phase.
Koji On Feb 10, 2014, at 5:01 PM, "Kodre,Ravi" <[email protected]> wrote: > I am seeing a bug in the pig behavior. So I went ahead and created a sample > dataset to share the bug details because I cannot share the original data and > script. > > This is my sample input file > > Data > > John|Gary|42 > > Pig Script > > data = LOAD 'data' USING PigStorage('|') AS (parent:chararray, > child:chararray, edge_id:chararray); > > data1 = FOREACH data GENERATE parent AS node1, child AS node2, edge_id; > > data2 = FOREACH data GENERATE child AS node1, parent AS node2, edge_id; > > data3 = UNION data1, data2; > > data4 = FOREACH data3 GENERATE node1, node2; > > DESCRIBE data4; > > $pig -x local bug.pig > > 2014-02-10 13:55:31,201 [main] INFO org.apache.pig.Main - Apache Pig version > 0.10.0-cdh3u4a (rexported) compiled Sep 04 2012, 14:03:46 > 2014-02-10 13:55:31,201 [main] INFO org.apache.pig.Main - Logging error > messages to: /x/home/abc/pig_1392069331197.log > 2014-02-10 13:55:31,452 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: file:///<file:////> > data4: {node2: chararray,node2: chararray} > > > I should be getting node1 and node2 in my schema but I am getting node2 > twice. Can anyone tell me what I am doing wrong here ? > > Thanks, > Ravi. > > (Paypal Data Scientist)
