[ 
https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-835:
-------------------------------

    Attachment: PIG-835.patch

The root cause of the issue is that the current multiQueryOptimizer checks if 
the map key is of the same type for different map plans it merges. If they are 
of different types, it ensures that the type is made tuple for all map plans - 
this implies keys which are not tuples will be wrapped in an extra tuple and 
keys which are already of Tuple type will be left alone (this is ensured in 
POLocalRearrange). However the Demux operator which passes the key and bag of 
values to the merged reduce plan currently always unwraps the tuple whenever 
the map keys are different. This results in unwrapping of keys which were 
originally tuples and should not be unwrapped. 

The attached patch fixes this by storing an array of boolean flags in the Demux 
operator to indicates which map keys are wrapped and which are not so that 
unwrapping occurs only in cases where the original map key was not already a 
tuple and was wrapped.

> Multiquery optimization does not handle the case where the map keys in the 
> split plans have different key types (tuple and non tuple key type)
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-835
>                 URL: https://issues.apache.org/jira/browse/PIG-835
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.2.1
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.3.0
>
>         Attachments: PIG-835.patch
>
>
> A query like the following results in an exception on execution:
> {noformat}
> a = load 'mult.input' as (name, age, gpa);
> b = group a ALL;
> c = foreach b generate group, COUNT(a);
> store c into 'foo';
> d = group a by (name, gpa);
> e = foreach d generate flatten(group), MIN(a.age);
> store e into 'bar';
> {noformat}
> Exception on execution:
> 09/06/04 16:56:11 INFO mapred.TaskInProgress: Error from 
> attempt_200906041655_0001_r_000000_3: java.lang.ClassCastException: 
> java.lang.String cannot be cast to org.apache.pig.data.Tuple
>     at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:312)
>     at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
>     at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
>     at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
>     at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
>     at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:248)
>     at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:238)
>     at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:320)
>     at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:288)
>     at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:268)
>     at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:142)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
>     at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to