[jira] Commented: (PIG-1108) Incorrect map output key type in MultiQuery optimization

Richard Ding (JIRA) Wed, 25 Nov 2009 14:11:05 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782651#action_12782651
 ]


Richard Ding commented on PIG-1108:
-----------------------------------

Either disableing MQ optimization or disabling second key optimization works on 
0.6.0 branch.

In the short term, the proposed solution is for MQ optimizer to not merge any 
MR job that is annotated with "UseSecondaryKey".  We need further investigation 
to see if there's any performance advantages for MQ optimizer to merge MR jobs 
with "UseSecondaryKey" annotation.   

> Incorrect map output key type in MultiQuery optimization
> --------------------------------------------------------
>
>                 Key: PIG-1108
>                 URL: https://issues.apache.org/jira/browse/PIG-1108
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Ankur
>            Assignee: Richard Ding
>
> When trying to merge 2 split plans, one of which never progresses along an 
> M/R boundary, PIG sets the map-output key type incorrectly resulting in the 
> following error:-
> java.io.IOException: Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableText, recieved 
> org.apache.pig.impl.io.NullableTuple
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
>       at 
> org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>       at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Here is a small script to be used a reproducible test case
> rmf plan1
> rmf plan2
> A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
> SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
> B = GROUP plan1 BY b;
> C = FOREACH B {
>               tmp = ORDER plan1 BY a desc;
>               GENERATE FLATTEN(group) as b, tmp;
>               };
> D = FILTER C BY b is not null;
> STORE D into 'plan1';
> STORE plan2 into 'plan2';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1108) Incorrect map output key type in MultiQuery optimization

Reply via email to