In-determinate behaviour of Union when there are 2 non-matching schema's
------------------------------------------------------------------------

                 Key: PIG-1065
                 URL: https://issues.apache.org/jira/browse/PIG-1065
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.6.0
            Reporter: Viraj Bhat
             Fix For: 0.6.0


I have a script which first does a union of these schemas and then does a ORDER 
BY of this result.

{code}
f1 = LOAD '1.txt' as (key:chararray, v:chararray);
f2 = LOAD '2.txt' as (key:chararray);
u0 = UNION f1, f2;
describe u0;
dump u0;

u1 = ORDER u0 BY $0;
dump u1;
{code}

When I run in Map Reduce mode I get the following result:
$java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
====================
Schema for u0 unknown.
====================
(1,2)
(2,3)
(1)
(2)
====================
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias u1
        at org.apache.pig.PigServer.openIterator(PigServer.java:475)
        at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
        at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
        at org.apache.pig.Main.main(Main.java:397)
====================
Caused by: java.io.IOException: Type mismatch in key from map: expected 
org.apache.pig.impl.io.NullableBytesWritable, recieved 
org.apache.pig.impl.io.NullableText
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
====================

When I run the same script in local mode I get a different result, as we know 
that local mode does not use any Hadoop Classes.
$java -cp pig.jar org.apache.pig.Main -x local broken.pig
====================
Schema for u0 unknown
====================
(1,2)
(1)
(2,3)
(2)
====================
(1,2)
(1)
(2,3)
(2)
====================

Here are some questions
1) Why do we allow union if the schemas do not match
2) Should we not print an error message/warning so that the user knows that 
this is not allowed or he can get unexpected results?

Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to