[
https://issues.apache.org/jira/browse/PIG-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771579#action_12771579
]
Sriranjan Manjunath commented on PIG-1048:
------------------------------------------
The reason this issue was happening was because for a key with 1 value, we were
allocating 2 reducers. The culprit being:
// number of reducers
- Integer cnt = 0;
- if (minIndex < maxIndex) {
- cnt = maxIndex - minIndex;
- } else {
- cnt = totalReducers[0] + maxIndex - minIndex;
- }
-
If maxIndex = minIndex = 0 and totalReducers was 1, cnt would get a value of 1
instead of 0!
cnt was based on 0-index whereas, totalReducers was based on 1-index. This
resulted in POParitionRearrange distributing the tuple to 1 more than the
required amount of reducers.
The fix is to always set the value of cnt to "maxIndex - minIndex". The code
guarantees that maxIndex is always greater than minIndex.
> inner join using 'skewed' produces multiple rows for keys with single row in
> both input relations
> -------------------------------------------------------------------------------------------------
>
> Key: PIG-1048
> URL: https://issues.apache.org/jira/browse/PIG-1048
> Project: Pig
> Issue Type: Bug
> Reporter: Thejas M Nair
> Assignee: Sriranjan Manjunath
>
> ${code}
> grunt> cat students.txt
> asdfxc M 23 12.44
> qwer F 21 14.44
> uhsdf M 34 12.11
> zxldf M 21 12.56
> qwer F 23 145.5
> oiue M 54 23.33
> l1 = load 'students.txt';
> l2 = load 'students.txt';
> j = join l1 by $0, l2 by $0 ;
> store j into 'tmp.txt'
> grunt> cat tmp.txt
> oiue M 54 23.33 oiue M 54 23.33
> oiue M 54 23.33 oiue M 54 23.33
> qwer F 21 14.44 qwer F 21 14.44
> qwer F 21 14.44 qwer F 23 145.5
> qwer F 23 145.5 qwer F 21 14.44
> qwer F 23 145.5 qwer F 23 145.5
> uhsdf M 34 12.11 uhsdf M 34 12.11
> uhsdf M 34 12.11 uhsdf M 34 12.11
> zxldf M 21 12.56 zxldf M 21 12.56
> zxldf M 21 12.56 zxldf M 21 12.56
> asdfxc M 23 12.44 asdfxc M 23 12.44
> asdfxc M 23 12.44 asdfxc M 23 12.44$
> ${code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.