[ 
https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783553#action_12783553
 ] 

Ankur commented on PIG-1114:
----------------------------

The error thrown is 

java.io.IOException: Type mismatch in key from map: expected 
org.apache.pig.impl.io.NullableTuple, recieved 
org.apache.pig.impl.io.NullableText
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
        at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:159)



> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Priority: Critical
>
> Multi-query optimization throws an error when merging 2 level splits. 
> Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as 
> total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as 
> count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to