[
https://issues.apache.org/jira/browse/PIG-364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620799#action_12620799
]
Daniel Dai commented on PIG-364:
--------------------------------
Seems no perfect solution. Here are three possible treatments:
1. If there is a limit in reducer, and number of reducer > 1, add another
map-reduce after that with only 1 reducer
Cons: extra-overhead
2. Instead of map-reduce, manupilate output file directly, keep top k in output
file
Cons: not orthodox, extra-overhead (but not as much as 1)
3. If there is a limit in reducer, change the parallel degree of the reducer to
1
Cons: can not take advantage of parallel processing for reducer
> Limit return incorrect records when we use multiple reducer
> -----------------------------------------------------------
>
> Key: PIG-364
> URL: https://issues.apache.org/jira/browse/PIG-364
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: types_branch
>
>
> Currently we put Limit(k) operator in the reducer plan. However, in the case
> of n reducer, we will get up to n*k output.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.