[ https://issues.apache.org/jira/browse/PIG-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771579#action_12771579 ]
Sriranjan Manjunath commented on PIG-1048: ------------------------------------------ The reason this issue was happening was because for a key with 1 value, we were allocating 2 reducers. The culprit being: // number of reducers - Integer cnt = 0; - if (minIndex < maxIndex) { - cnt = maxIndex - minIndex; - } else { - cnt = totalReducers[0] + maxIndex - minIndex; - } - If maxIndex = minIndex = 0 and totalReducers was 1, cnt would get a value of 1 instead of 0! cnt was based on 0-index whereas, totalReducers was based on 1-index. This resulted in POParitionRearrange distributing the tuple to 1 more than the required amount of reducers. The fix is to always set the value of cnt to "maxIndex - minIndex". The code guarantees that maxIndex is always greater than minIndex. > inner join using 'skewed' produces multiple rows for keys with single row in > both input relations > ------------------------------------------------------------------------------------------------- > > Key: PIG-1048 > URL: https://issues.apache.org/jira/browse/PIG-1048 > Project: Pig > Issue Type: Bug > Reporter: Thejas M Nair > Assignee: Sriranjan Manjunath > > ${code} > grunt> cat students.txt > asdfxc M 23 12.44 > qwer F 21 14.44 > uhsdf M 34 12.11 > zxldf M 21 12.56 > qwer F 23 145.5 > oiue M 54 23.33 > l1 = load 'students.txt'; > l2 = load 'students.txt'; > j = join l1 by $0, l2 by $0 ; > store j into 'tmp.txt' > grunt> cat tmp.txt > oiue M 54 23.33 oiue M 54 23.33 > oiue M 54 23.33 oiue M 54 23.33 > qwer F 21 14.44 qwer F 21 14.44 > qwer F 21 14.44 qwer F 23 145.5 > qwer F 23 145.5 qwer F 21 14.44 > qwer F 23 145.5 qwer F 23 145.5 > uhsdf M 34 12.11 uhsdf M 34 12.11 > uhsdf M 34 12.11 uhsdf M 34 12.11 > zxldf M 21 12.56 zxldf M 21 12.56 > zxldf M 21 12.56 zxldf M 21 12.56 > asdfxc M 23 12.44 asdfxc M 23 12.44 > asdfxc M 23 12.44 asdfxc M 23 12.44$ > ${code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.