[
https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pradeep Kamath updated PIG-835:
-------------------------------
Attachment: PIG-835.patch
The root cause of the issue is that the current multiQueryOptimizer checks if
the map key is of the same type for different map plans it merges. If they are
of different types, it ensures that the type is made tuple for all map plans -
this implies keys which are not tuples will be wrapped in an extra tuple and
keys which are already of Tuple type will be left alone (this is ensured in
POLocalRearrange). However the Demux operator which passes the key and bag of
values to the merged reduce plan currently always unwraps the tuple whenever
the map keys are different. This results in unwrapping of keys which were
originally tuples and should not be unwrapped.
The attached patch fixes this by storing an array of boolean flags in the Demux
operator to indicates which map keys are wrapped and which are not so that
unwrapping occurs only in cases where the original map key was not already a
tuple and was wrapped.
> Multiquery optimization does not handle the case where the map keys in the
> split plans have different key types (tuple and non tuple key type)
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: PIG-835
> URL: https://issues.apache.org/jira/browse/PIG-835
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.2.1
> Reporter: Pradeep Kamath
> Assignee: Pradeep Kamath
> Fix For: 0.3.0
>
> Attachments: PIG-835.patch
>
>
> A query like the following results in an exception on execution:
> {noformat}
> a = load 'mult.input' as (name, age, gpa);
> b = group a ALL;
> c = foreach b generate group, COUNT(a);
> store c into 'foo';
> d = group a by (name, gpa);
> e = foreach d generate flatten(group), MIN(a.age);
> store e into 'bar';
> {noformat}
> Exception on execution:
> 09/06/04 16:56:11 INFO mapred.TaskInProgress: Error from
> attempt_200906041655_0001_r_000000_3: java.lang.ClassCastException:
> java.lang.String cannot be cast to org.apache.pig.data.Tuple
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:312)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:248)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:238)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:320)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:288)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:268)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:142)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.