[ 
https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774054#action_12774054
 ] 

Pradeep Kamath commented on PIG-1065:
-------------------------------------

This is an instance of the problem of representing unknown schema with a null 
schema. If the schema of a relational operator is null, pig assumes the fields 
are of type bytearray which is incorrect. An unknown schema really means we 
don't know the types of the fields. In the above case, once pig determines that 
the two schemas have different sizes, it sets the schema of LOUnion to null (to 
represent unknown schema). Hence the order by expects the fields coming out of 
the union to be byte arrays but in reality the first field (which is the sort 
key above) is a chararray - this results in a runtime exception.

I propose that when either of the two inputs to a union have a schema we should 
error out if the two are incompatible and not continue. If the two inputs don't 
have a schema then we can proceed with null schema - thoughts?

> In-determinate behaviour of Union when there are 2 non-matching schema's
> ------------------------------------------------------------------------
>
>                 Key: PIG-1065
>                 URL: https://issues.apache.org/jira/browse/PIG-1065
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Viraj Bhat
>             Fix For: 0.6.0
>
>
> I have a script which first does a union of these schemas and then does a 
> ORDER BY of this result.
> {code}
> f1 = LOAD '1.txt' as (key:chararray, v:chararray);
> f2 = LOAD '2.txt' as (key:chararray);
> u0 = UNION f1, f2;
> describe u0;
> dump u0;
> u1 = ORDER u0 BY $0;
> dump u1;
> {code}
> When I run in Map Reduce mode I get the following result:
> $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
> ====================
> Schema for u0 unknown.
> ====================
> (1,2)
> (2,3)
> (1)
> (2)
> ====================
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias u1
>         at org.apache.pig.PigServer.openIterator(PigServer.java:475)
>         at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
>         at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>         at org.apache.pig.Main.main(Main.java:397)
> ====================
> Caused by: java.io.IOException: Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableBytesWritable, recieved 
> org.apache.pig.impl.io.NullableText
>         at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> ====================
> When I run the same script in local mode I get a different result, as we know 
> that local mode does not use any Hadoop Classes.
> $java -cp pig.jar org.apache.pig.Main -x local broken.pig
> ====================
> Schema for u0 unknown
> ====================
> (1,2)
> (1)
> (2,3)
> (2)
> ====================
> (1,2)
> (1)
> (2,3)
> (2)
> ====================
> Here are some questions
> 1) Why do we allow union if the schemas do not match
> 2) Should we not print an error message/warning so that the user knows that 
> this is not allowed or he can get unexpected results?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to