[ 
https://issues.apache.org/jira/browse/PIG-364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-364:
---------------------------

    Attachment: PIG-364-3.patch

Hi, Shravan,
Thank you for your detailed explanation. Here is the modified patch addressing 
your comment 1 and 3. I tested using the following script:

a = load 'studenttab10k';
b = group a by $0 parallel 10;
c = limit b 10;
split c into c1 if $0 lt 'bob white', c2 if $0 gte 'bob white';
c12 = group c1 by $0;
c22 = group c2 by $0;
c4 = union c12, c22;
dump c4;


> Limit return incorrect records when we use multiple reducer
> -----------------------------------------------------------
>
>                 Key: PIG-364
>                 URL: https://issues.apache.org/jira/browse/PIG-364
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: types_branch
>
>         Attachments: limitsplit.png, PIG-364-2.patch, PIG-364-3.patch, 
> PIG-364.patch
>
>
> Currently we put Limit(k) operator in the reducer plan. However, in the case 
> of n reducer, we will get up to n*k output. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to