[
https://issues.apache.org/jira/browse/PIG-364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-364:
---------------------------
Attachment: PIG-364-3.patch
Hi, Shravan,
Thank you for your detailed explanation. Here is the modified patch addressing
your comment 1 and 3. I tested using the following script:
a = load 'studenttab10k';
b = group a by $0 parallel 10;
c = limit b 10;
split c into c1 if $0 lt 'bob white', c2 if $0 gte 'bob white';
c12 = group c1 by $0;
c22 = group c2 by $0;
c4 = union c12, c22;
dump c4;
> Limit return incorrect records when we use multiple reducer
> -----------------------------------------------------------
>
> Key: PIG-364
> URL: https://issues.apache.org/jira/browse/PIG-364
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: types_branch
>
> Attachments: limitsplit.png, PIG-364-2.patch, PIG-364-3.patch,
> PIG-364.patch
>
>
> Currently we put Limit(k) operator in the reducer plan. However, in the case
> of n reducer, we will get up to n*k output.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.