[
https://issues.apache.org/jira/browse/PIG-364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-364:
---------------------------
Attachment: PIG-364.patch
This patch takes approach 1. It will add one additional map-reduce operator
with 1 reducer if the requested parallelism > 1. Now the behavior of limit is:
1. If the map plan is closed before POLimit operator, we put POLimit in reduce
plan, grant requested parallelism, if requested parallelism > 1, close reduce
plan, add one additional map-reduce operator with 1 reducer
2. If the map plan is open before POLimit operator, we put POLimit in map plan,
close map plan, add another POLimit to reduce plan, and set parallelism of this
map-reduce operator 1. Although in this case, POLimit create a map-reduce
boundary, we do not associate a parallel option with limit keyword. I believe
provide a parallel option with limit will arouse confusion to the user, because
it is relatively hard to explain to the user whether this parallel option will
be granted or not
3. In limited sort case, we will have POSort with limit<>-1. If the parallelism
for POSort > 1, we add one additional map-reduce operator with 1 reducer
> Limit return incorrect records when we use multiple reducer
> -----------------------------------------------------------
>
> Key: PIG-364
> URL: https://issues.apache.org/jira/browse/PIG-364
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: types_branch
>
> Attachments: PIG-364.patch
>
>
> Currently we put Limit(k) operator in the reducer plan. However, in the case
> of n reducer, we will get up to n*k output.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.