MultiQuery optimization throws error when merging 2 level splits
----------------------------------------------------------------
Key: PIG-1114
URL: https://issues.apache.org/jira/browse/PIG-1114
Project: Pig
Issue Type: Bug
Reporter: Ankur
Priority: Critical
Multi-query optimization throws an error when merging 2 level splits. Following
is the script to reproduce the error
data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
ids = FOREACH data GENERATE id;
allId = GROUP ids all;
allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
idGroup = GROUP ids by id;
idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
countTotal = cross idGroupCount, allIdCount;
idCountTotal = foreach countTotal generate
id,
count,
total,
(double)count / (double)total as proportion;
orderedCounts = order idCountTotal by count desc;
STORE orderedCounts INTO 'mq_problem/ids';
names = FOREACH data GENERATE name;
allNames = GROUP names all;
allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as
total;
nameGroup = GROUP names by name;
nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as
count;
namesCrossed = cross nameGroupCount, allNamesCount;
nameCountTotal = foreach namesCrossed generate
name,
count,
total,
(double)count / (double)total as proportion;
nameCountsOrdered = order nameCountTotal by count desc;
STORE nameCountsOrdered INTO 'mq_problem/names';
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.