[
https://issues.apache.org/jira/browse/PIG-454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Gates updated PIG-454:
---------------------------
Attachment: PIG-454.patch
CombinerOptimizer is a visitor that walks the entire plan of MapReduceOpers.
It was not resetting state as it visited each operator, causing it to get
confused on the key to set in the combiner in cases where there were multiple
ops that could use the combiner.
> group by followed by group ALL causes error in reduce
> -----------------------------------------------------
>
> Key: PIG-454
> URL: https://issues.apache.org/jira/browse/PIG-454
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Pradeep Kamath
> Assignee: Alan Gates
> Fix For: types_branch
>
> Attachments: PIG-454.patch
>
>
> Script:
> {code}
> a = load 'st10k' as (name, age, gpa);
> b = group a by name;
> c = foreach b generate flatten(group), COUNT(a) as cnt;
> d = group c all;
> e = foreach d generate AVG(c.cnt);
> dump e;
> {code}
> Error:
> {noformat}
> 2008-09-23 17:58:12,002 [main] ERROR
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Job failed!
> 2008-09-23 17:58:12,004 [main] ERROR
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error
> message from task (map) tip_200809051428_0117_m_000000java.io.IOException:
> wrong key class: org.apache.pig.impl.io.NullableTuple is not class
> org.apache.pig.impl.io.NullableText
> at
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:995)
> at
> org.apache.hadoop.mapred.MapTask$CombineOutputCollector.collect(MapTask.java:1079)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:155)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:56)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:872)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:779)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:691)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
> at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
> ...
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.