[ https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pradeep Kamath updated PIG-835: ------------------------------- Attachment: PIG-835.patch The root cause of the issue is that the current multiQueryOptimizer checks if the map key is of the same type for different map plans it merges. If they are of different types, it ensures that the type is made tuple for all map plans - this implies keys which are not tuples will be wrapped in an extra tuple and keys which are already of Tuple type will be left alone (this is ensured in POLocalRearrange). However the Demux operator which passes the key and bag of values to the merged reduce plan currently always unwraps the tuple whenever the map keys are different. This results in unwrapping of keys which were originally tuples and should not be unwrapped. The attached patch fixes this by storing an array of boolean flags in the Demux operator to indicates which map keys are wrapped and which are not so that unwrapping occurs only in cases where the original map key was not already a tuple and was wrapped. > Multiquery optimization does not handle the case where the map keys in the > split plans have different key types (tuple and non tuple key type) > ---------------------------------------------------------------------------------------------------------------------------------------------- > > Key: PIG-835 > URL: https://issues.apache.org/jira/browse/PIG-835 > Project: Pig > Issue Type: Bug > Affects Versions: 0.2.1 > Reporter: Pradeep Kamath > Assignee: Pradeep Kamath > Fix For: 0.3.0 > > Attachments: PIG-835.patch > > > A query like the following results in an exception on execution: > {noformat} > a = load 'mult.input' as (name, age, gpa); > b = group a ALL; > c = foreach b generate group, COUNT(a); > store c into 'foo'; > d = group a by (name, gpa); > e = foreach d generate flatten(group), MIN(a.age); > store e into 'bar'; > {noformat} > Exception on execution: > 09/06/04 16:56:11 INFO mapred.TaskInProgress: Error from > attempt_200906041655_0001_r_000000_3: java.lang.ClassCastException: > java.lang.String cannot be cast to org.apache.pig.data.Tuple > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:312) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:248) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:238) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:320) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:288) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:268) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:142) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.