[ 
https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776116#action_12776116
 ] 

Pradeep Kamath commented on PIG-1065:
-------------------------------------

To elaborate a little on the errors that arise, consider the script in the 
description of this issue. We have incompatible schemas and hence we set the 
schema of the union to be null (unknown). However the first field actually is a 
chararray as specified by the schemas. Hence there is a type mismatch in the 
backend (unknown schema implies all fields are bytearrays).

Another example is something like:
{noformat}
A = load 'foo' as (c:chararray);
B = load 'bar';
C = Union A, B;
D = group C by $0;
{noformat}

In the above script also, currently we assign unknown schema to C. However the 
key of the group by will be bytearray when it comes from B and chararray when 
it comes from A.

If we have to honor schema of any of the inputs of the union, we can only do so 
if we can have a legitimate schema for the union - currently in all 
incompatible schema situations, we essentially disregard input schemas and 
assume all fields are bytearrays.

> In-determinate behaviour of Union when there are 2 non-matching schema's
> ------------------------------------------------------------------------
>
>                 Key: PIG-1065
>                 URL: https://issues.apache.org/jira/browse/PIG-1065
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Viraj Bhat
>             Fix For: 0.6.0
>
>
> I have a script which first does a union of these schemas and then does a 
> ORDER BY of this result.
> {code}
> f1 = LOAD '1.txt' as (key:chararray, v:chararray);
> f2 = LOAD '2.txt' as (key:chararray);
> u0 = UNION f1, f2;
> describe u0;
> dump u0;
> u1 = ORDER u0 BY $0;
> dump u1;
> {code}
> When I run in Map Reduce mode I get the following result:
> $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
> ====================
> Schema for u0 unknown.
> ====================
> (1,2)
> (2,3)
> (1)
> (2)
> ====================
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias u1
>         at org.apache.pig.PigServer.openIterator(PigServer.java:475)
>         at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
>         at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>         at org.apache.pig.Main.main(Main.java:397)
> ====================
> Caused by: java.io.IOException: Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableBytesWritable, recieved 
> org.apache.pig.impl.io.NullableText
>         at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> ====================
> When I run the same script in local mode I get a different result, as we know 
> that local mode does not use any Hadoop Classes.
> $java -cp pig.jar org.apache.pig.Main -x local broken.pig
> ====================
> Schema for u0 unknown
> ====================
> (1,2)
> (1)
> (2,3)
> (2)
> ====================
> (1,2)
> (1)
> (2,3)
> (2)
> ====================
> Here are some questions
> 1) Why do we allow union if the schemas do not match
> 2) Should we not print an error message/warning so that the user knows that 
> this is not allowed or he can get unexpected results?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to