[ 
https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774086#action_12774086
 ] 

Pradeep Kamath commented on PIG-1065:
-------------------------------------

Currently pig parser does not allow specifying a schema for the union. If we do 
want to allow it there are a few details which arise:
1) From what I know, currently pig allowing specifying row schemas only in load 
statements. Is this restriction by design? ForEach allows only to give 
different schemas at individual field level and even there the type has to 
match if I recollect right.
2)  If the input schemas ae unequal in size, should the specified union schema 
size strictly be MAX of the input schema sizes with nulls being projected for 
missing columns in the input with the smaller schema? So a specified schema of 
size < MAX(size of input schemas) will not be allowed?
3) Can the specified union schema have different types (castable) than the 
result of merging the two input schemas - for example if after merging the two 
input schemas if the first input has int and if specified schema has long? What 
about demotions like the merged schema being long and specified one being int - 
would those be disallowed? - I suppose we just allow whatever is allowed in 
casts

> In-determinate behaviour of Union when there are 2 non-matching schema's
> ------------------------------------------------------------------------
>
>                 Key: PIG-1065
>                 URL: https://issues.apache.org/jira/browse/PIG-1065
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Viraj Bhat
>             Fix For: 0.6.0
>
>
> I have a script which first does a union of these schemas and then does a 
> ORDER BY of this result.
> {code}
> f1 = LOAD '1.txt' as (key:chararray, v:chararray);
> f2 = LOAD '2.txt' as (key:chararray);
> u0 = UNION f1, f2;
> describe u0;
> dump u0;
> u1 = ORDER u0 BY $0;
> dump u1;
> {code}
> When I run in Map Reduce mode I get the following result:
> $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
> ====================
> Schema for u0 unknown.
> ====================
> (1,2)
> (2,3)
> (1)
> (2)
> ====================
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias u1
>         at org.apache.pig.PigServer.openIterator(PigServer.java:475)
>         at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
>         at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>         at org.apache.pig.Main.main(Main.java:397)
> ====================
> Caused by: java.io.IOException: Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableBytesWritable, recieved 
> org.apache.pig.impl.io.NullableText
>         at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> ====================
> When I run the same script in local mode I get a different result, as we know 
> that local mode does not use any Hadoop Classes.
> $java -cp pig.jar org.apache.pig.Main -x local broken.pig
> ====================
> Schema for u0 unknown
> ====================
> (1,2)
> (1)
> (2,3)
> (2)
> ====================
> (1,2)
> (1)
> (2,3)
> (2)
> ====================
> Here are some questions
> 1) Why do we allow union if the schemas do not match
> 2) Should we not print an error message/warning so that the user knows that 
> this is not allowed or he can get unexpected results?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to