[ https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774065#action_12774065 ]
Thejas M Nair commented on PIG-1065: ------------------------------------ Can this be allowed (in case of incompatible schemas as in description) - u0 = UNION f1, f2 as (key:chararray, v:chararray); ? > In-determinate behaviour of Union when there are 2 non-matching schema's > ------------------------------------------------------------------------ > > Key: PIG-1065 > URL: https://issues.apache.org/jira/browse/PIG-1065 > Project: Pig > Issue Type: Bug > Affects Versions: 0.6.0 > Reporter: Viraj Bhat > Fix For: 0.6.0 > > > I have a script which first does a union of these schemas and then does a > ORDER BY of this result. > {code} > f1 = LOAD '1.txt' as (key:chararray, v:chararray); > f2 = LOAD '2.txt' as (key:chararray); > u0 = UNION f1, f2; > describe u0; > dump u0; > u1 = ORDER u0 BY $0; > dump u1; > {code} > When I run in Map Reduce mode I get the following result: > $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig > ==================== > Schema for u0 unknown. > ==================== > (1,2) > (2,3) > (1) > (2) > ==================== > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias u1 > at org.apache.pig.PigServer.openIterator(PigServer.java:475) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > ==================== > Caused by: java.io.IOException: Type mismatch in key from map: expected > org.apache.pig.impl.io.NullableBytesWritable, recieved > org.apache.pig.impl.io.NullableText > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) > ==================== > When I run the same script in local mode I get a different result, as we know > that local mode does not use any Hadoop Classes. > $java -cp pig.jar org.apache.pig.Main -x local broken.pig > ==================== > Schema for u0 unknown > ==================== > (1,2) > (1) > (2,3) > (2) > ==================== > (1,2) > (1) > (2,3) > (2) > ==================== > Here are some questions > 1) Why do we allow union if the schemas do not match > 2) Should we not print an error message/warning so that the user knows that > this is not allowed or he can get unexpected results? > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.