[ 
https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783991#action_12783991
 ] 

Richard Ding commented on PIG-1114:
-----------------------------------

The reason we got this exception is that the MultiQuery optimizer doesn't 
recursively set data type in local rearrange operators (it only sets on the 
first level). This is required in the case where merged jobs don't have the 
same map key types.

> MultiQuery optimization throws error when merging 2 level splits
> ----------------------------------------------------------------
>
>                 Key: PIG-1114
>                 URL: https://issues.apache.org/jira/browse/PIG-1114
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>            Assignee: Richard Ding
>            Priority: Critical
>             Fix For: 0.6.0
>
>
> Multi-query optimization throws an error when merging 2 level splits. 
> Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
>         id,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as 
> total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as 
> count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
>         name,
>         count,
>         total,
>         (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to