[jira] Commented: (PIG-1108) Incorrect map output key type in MultiQuery optimization

Richard Ding (JIRA) Wed, 25 Nov 2009 11:38:05 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782575#action_12782575
 ]


Richard Ding commented on PIG-1108:
-----------------------------------

The problme is that the secondary key optimization doesn't work well with 
multiquery optimization where one splittee has secondary key and the other 
splittee has no secondary key.

The work around for now is to disable either multiquery optimization (with -M) 
or secondary key optimization (with -Dpig.exec.nosecondarykey=true).

To fix this, we can modify the multiquery optimizer (since now secondary key 
optimizer runs before the multiquery optimizer) to take into account the 
exsitence of secondary key optimization.

 

> Incorrect map output key type in MultiQuery optimization
> --------------------------------------------------------
>
>                 Key: PIG-1108
>                 URL: https://issues.apache.org/jira/browse/PIG-1108
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Ankur
>            Assignee: Richard Ding
>
> When trying to merge 2 split plans, one of which never progresses along an 
> M/R boundary, PIG sets the map-output key type incorrectly resulting in the 
> following error:-
> java.io.IOException: Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableText, recieved 
> org.apache.pig.impl.io.NullableTuple
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
>       at 
> org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>       at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Here is a small script to be used a reproducible test case
> rmf plan1
> rmf plan2
> A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
> SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
> B = GROUP plan1 BY b;
> C = FOREACH B {
>               tmp = ORDER plan1 BY a desc;
>               GENERATE FLATTEN(group) as b, tmp;
>               };
> D = FILTER C BY b is not null;
> STORE D into 'plan1';
> STORE plan2 into 'plan2';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1108) Incorrect map output key type in MultiQuery optimization

Reply via email to