[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-792: --- Resolution: Fixed Status: Resolved (was: Patch Available) The code has been committed. Thanks, Sri and Ying for this important contribution PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: skewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Attachment: (was: skewedjoin.patch) PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: skewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Status: Open (was: Patch Available) PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: skewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Attachment: skewedjoin.patch PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: skewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Attachment: (was: skewedjoin.patch) PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: skewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Status: Patch Available (was: Open) PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: skewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Attachment: skewedjoin.patch I have attached a patch with most of the suggested modifications. I am still using NullablePartitionWritable instead of a Tuple since it provides a cleaner approach by using an adapter class and we don't have to check for the position of the keys in multiple places. Please ignore the findbugs warning for taking an absolute value of a hashcode. I use it as an array index and hence need the absolute value. PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: skewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Attachment: skewedjoin.patch Merged from trunk and cleared all unit tests PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: skewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Attachment: (was: skewedjoin.patch) PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: skewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Status: Open (was: Patch Available) PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: skewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Status: Patch Available (was: Open) PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: skewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Attachment: (was: skewedjoin.patch) PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: skewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Attachment: lojoin.patch testskewedjoin.patch skewedjoin.patch Patches for lojoin and skewedjoin. This version of the patch has the root directory set to trunk as against pig that was uploaded earlier. PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: lojoin.patch, skewedjoin.patch, testskewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Attachment: (was: lojoin.patch) PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: lojoin.patch, skewedjoin.patch, testskewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Attachment: (was: testskewedjoin.patch) PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: skewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Attachment: (was: skewedjoin.patch) PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: skewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santhosh Srinivasan updated PIG-792: Status: Patch Available (was: Open) PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: skewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-792: Attachment: lojoin.patch Patch contains code for the logical operator LOJoin PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: lojoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-792: --- Summary: PERFORMANCE: Support skewed join in pig (was: Support skewed join in pig) PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.