[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-12 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-835:
---

Attachment: PIG-846-v2.patch

New patch - the only change is to not add extra information in 
POLocalRearrange.name() - was in the earlier patch only to add more information 
in explain outputs but this breaks some unit tests. 

TestHBaseStorage unit test still fails for me but the failure is not related to 
the changes in the patch - am assuming that is an environment issue on my 
machine.

 Multiquery optimization does not handle the case where the map keys in the 
 split plans have different key types (tuple and non tuple key type)
 --

 Key: PIG-835
 URL: https://issues.apache.org/jira/browse/PIG-835
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.1
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.3.0

 Attachments: PIG-835-v2.patch, PIG-835.patch, PIG-846-v2.patch


 A query like the following results in an exception on execution:
 {noformat}
 a = load 'mult.input' as (name, age, gpa);
 b = group a ALL;
 c = foreach b generate group, COUNT(a);
 store c into 'foo';
 d = group a by (name, gpa);
 e = foreach d generate flatten(group), MIN(a.age);
 store e into 'bar';
 {noformat}
 Exception on execution:
 09/06/04 16:56:11 INFO mapred.TaskInProgress: Error from 
 attempt_200906041655_0001_r_00_3: java.lang.ClassCastException: 
 java.lang.String cannot be cast to org.apache.pig.data.Tuple
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:312)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:248)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:238)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:320)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:288)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:268)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:142)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
 at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-12 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-835:
---

Comment: was deleted

(was: New patch - the only change is to not add extra information in 
POLocalRearrange.name() - was in the earlier patch only to add more information 
in explain outputs but this breaks some unit tests. 

TestHBaseStorage unit test still fails for me but the failure is not related to 
the changes in the patch - am assuming that is an environment issue on my 
machine.)

 Multiquery optimization does not handle the case where the map keys in the 
 split plans have different key types (tuple and non tuple key type)
 --

 Key: PIG-835
 URL: https://issues.apache.org/jira/browse/PIG-835
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.1
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.3.0

 Attachments: PIG-835-v2.patch, PIG-835.patch


 A query like the following results in an exception on execution:
 {noformat}
 a = load 'mult.input' as (name, age, gpa);
 b = group a ALL;
 c = foreach b generate group, COUNT(a);
 store c into 'foo';
 d = group a by (name, gpa);
 e = foreach d generate flatten(group), MIN(a.age);
 store e into 'bar';
 {noformat}
 Exception on execution:
 09/06/04 16:56:11 INFO mapred.TaskInProgress: Error from 
 attempt_200906041655_0001_r_00_3: java.lang.ClassCastException: 
 java.lang.String cannot be cast to org.apache.pig.data.Tuple
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:312)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:248)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:238)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:320)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:288)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:268)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:142)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
 at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-09 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-835:
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch commited to both trunk and branch-0.3

 Multiquery optimization does not handle the case where the map keys in the 
 split plans have different key types (tuple and non tuple key type)
 --

 Key: PIG-835
 URL: https://issues.apache.org/jira/browse/PIG-835
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.1
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.3.0

 Attachments: PIG-835-v2.patch, PIG-835.patch


 A query like the following results in an exception on execution:
 {noformat}
 a = load 'mult.input' as (name, age, gpa);
 b = group a ALL;
 c = foreach b generate group, COUNT(a);
 store c into 'foo';
 d = group a by (name, gpa);
 e = foreach d generate flatten(group), MIN(a.age);
 store e into 'bar';
 {noformat}
 Exception on execution:
 09/06/04 16:56:11 INFO mapred.TaskInProgress: Error from 
 attempt_200906041655_0001_r_00_3: java.lang.ClassCastException: 
 java.lang.String cannot be cast to org.apache.pig.data.Tuple
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:312)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:248)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:238)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:320)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:288)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:268)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:142)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
 at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-08 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated PIG-835:
---

Status: Patch Available  (was: Open)

resubmitting the patch

 Multiquery optimization does not handle the case where the map keys in the 
 split plans have different key types (tuple and non tuple key type)
 --

 Key: PIG-835
 URL: https://issues.apache.org/jira/browse/PIG-835
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.1
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.3.0

 Attachments: PIG-835.patch


 A query like the following results in an exception on execution:
 {noformat}
 a = load 'mult.input' as (name, age, gpa);
 b = group a ALL;
 c = foreach b generate group, COUNT(a);
 store c into 'foo';
 d = group a by (name, gpa);
 e = foreach d generate flatten(group), MIN(a.age);
 store e into 'bar';
 {noformat}
 Exception on execution:
 09/06/04 16:56:11 INFO mapred.TaskInProgress: Error from 
 attempt_200906041655_0001_r_00_3: java.lang.ClassCastException: 
 java.lang.String cannot be cast to org.apache.pig.data.Tuple
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:312)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:248)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:238)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:320)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:288)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:268)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:142)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
 at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-08 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-835:
---

Status: Open  (was: Patch Available)

 Multiquery optimization does not handle the case where the map keys in the 
 split plans have different key types (tuple and non tuple key type)
 --

 Key: PIG-835
 URL: https://issues.apache.org/jira/browse/PIG-835
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.1
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.3.0

 Attachments: PIG-835.patch


 A query like the following results in an exception on execution:
 {noformat}
 a = load 'mult.input' as (name, age, gpa);
 b = group a ALL;
 c = foreach b generate group, COUNT(a);
 store c into 'foo';
 d = group a by (name, gpa);
 e = foreach d generate flatten(group), MIN(a.age);
 store e into 'bar';
 {noformat}
 Exception on execution:
 09/06/04 16:56:11 INFO mapred.TaskInProgress: Error from 
 attempt_200906041655_0001_r_00_3: java.lang.ClassCastException: 
 java.lang.String cannot be cast to org.apache.pig.data.Tuple
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:312)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:248)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:238)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:320)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:288)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:268)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:142)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
 at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-08 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-835:
---

Attachment: PIG-835-v2.patch

New patch with findbugs warnings addressed - essentially findbugs wanted the 
public static members in PigNUllableWritable to be marked final.

 Multiquery optimization does not handle the case where the map keys in the 
 split plans have different key types (tuple and non tuple key type)
 --

 Key: PIG-835
 URL: https://issues.apache.org/jira/browse/PIG-835
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.1
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.3.0

 Attachments: PIG-835-v2.patch, PIG-835.patch


 A query like the following results in an exception on execution:
 {noformat}
 a = load 'mult.input' as (name, age, gpa);
 b = group a ALL;
 c = foreach b generate group, COUNT(a);
 store c into 'foo';
 d = group a by (name, gpa);
 e = foreach d generate flatten(group), MIN(a.age);
 store e into 'bar';
 {noformat}
 Exception on execution:
 09/06/04 16:56:11 INFO mapred.TaskInProgress: Error from 
 attempt_200906041655_0001_r_00_3: java.lang.ClassCastException: 
 java.lang.String cannot be cast to org.apache.pig.data.Tuple
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:312)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:248)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:238)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:320)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:288)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:268)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:142)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
 at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-08 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-835:
---

Status: Patch Available  (was: Open)

 Multiquery optimization does not handle the case where the map keys in the 
 split plans have different key types (tuple and non tuple key type)
 --

 Key: PIG-835
 URL: https://issues.apache.org/jira/browse/PIG-835
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.1
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.3.0

 Attachments: PIG-835-v2.patch, PIG-835.patch


 A query like the following results in an exception on execution:
 {noformat}
 a = load 'mult.input' as (name, age, gpa);
 b = group a ALL;
 c = foreach b generate group, COUNT(a);
 store c into 'foo';
 d = group a by (name, gpa);
 e = foreach d generate flatten(group), MIN(a.age);
 store e into 'bar';
 {noformat}
 Exception on execution:
 09/06/04 16:56:11 INFO mapred.TaskInProgress: Error from 
 attempt_200906041655_0001_r_00_3: java.lang.ClassCastException: 
 java.lang.String cannot be cast to org.apache.pig.data.Tuple
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:312)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:248)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:238)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:320)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:288)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:268)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:142)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
 at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-05 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-835:
---

Attachment: PIG-835.patch

The root cause of the issue is that the current multiQueryOptimizer checks if 
the map key is of the same type for different map plans it merges. If they are 
of different types, it ensures that the type is made tuple for all map plans - 
this implies keys which are not tuples will be wrapped in an extra tuple and 
keys which are already of Tuple type will be left alone (this is ensured in 
POLocalRearrange). However the Demux operator which passes the key and bag of 
values to the merged reduce plan currently always unwraps the tuple whenever 
the map keys are different. This results in unwrapping of keys which were 
originally tuples and should not be unwrapped. 

The attached patch fixes this by storing an array of boolean flags in the Demux 
operator to indicates which map keys are wrapped and which are not so that 
unwrapping occurs only in cases where the original map key was not already a 
tuple and was wrapped.

 Multiquery optimization does not handle the case where the map keys in the 
 split plans have different key types (tuple and non tuple key type)
 --

 Key: PIG-835
 URL: https://issues.apache.org/jira/browse/PIG-835
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.1
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.3.0

 Attachments: PIG-835.patch


 A query like the following results in an exception on execution:
 {noformat}
 a = load 'mult.input' as (name, age, gpa);
 b = group a ALL;
 c = foreach b generate group, COUNT(a);
 store c into 'foo';
 d = group a by (name, gpa);
 e = foreach d generate flatten(group), MIN(a.age);
 store e into 'bar';
 {noformat}
 Exception on execution:
 09/06/04 16:56:11 INFO mapred.TaskInProgress: Error from 
 attempt_200906041655_0001_r_00_3: java.lang.ClassCastException: 
 java.lang.String cannot be cast to org.apache.pig.data.Tuple
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:312)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:248)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:238)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:320)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:288)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:268)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:142)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
 at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-05 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-835:
---

Status: Patch Available  (was: Open)

 Multiquery optimization does not handle the case where the map keys in the 
 split plans have different key types (tuple and non tuple key type)
 --

 Key: PIG-835
 URL: https://issues.apache.org/jira/browse/PIG-835
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.1
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.3.0

 Attachments: PIG-835.patch


 A query like the following results in an exception on execution:
 {noformat}
 a = load 'mult.input' as (name, age, gpa);
 b = group a ALL;
 c = foreach b generate group, COUNT(a);
 store c into 'foo';
 d = group a by (name, gpa);
 e = foreach d generate flatten(group), MIN(a.age);
 store e into 'bar';
 {noformat}
 Exception on execution:
 09/06/04 16:56:11 INFO mapred.TaskInProgress: Error from 
 attempt_200906041655_0001_r_00_3: java.lang.ClassCastException: 
 java.lang.String cannot be cast to org.apache.pig.data.Tuple
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:312)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:248)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:238)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:320)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:288)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:268)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:142)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
 at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.