[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-792:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

The code has been committed. Thanks, Sri and Ying for this important 
contribution

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: skewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-28 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Attachment: (was: skewedjoin.patch)

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: skewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-23 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Status: Open  (was: Patch Available)

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: skewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-23 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Attachment: skewedjoin.patch

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: skewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-23 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Attachment: (was: skewedjoin.patch)

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: skewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-23 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Status: Patch Available  (was: Open)

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: skewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-22 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Attachment: skewedjoin.patch

I have attached a patch with most of the suggested modifications. I am still 
using NullablePartitionWritable instead of a Tuple since it provides a cleaner 
approach by using an adapter class and we don't have to check for the position 
of the keys in multiple places.

Please ignore the findbugs warning for taking an absolute value of a hashcode. 
I use it as an array index and hence need the absolute value.

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: skewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-08 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Attachment: skewedjoin.patch

Merged from trunk and cleared all unit tests

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: skewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-08 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Attachment: (was: skewedjoin.patch)

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: skewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-08 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Status: Open  (was: Patch Available)

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: skewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-08 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Status: Patch Available  (was: Open)

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: skewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-08 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Attachment: (was: skewedjoin.patch)

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: skewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-02 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Attachment: lojoin.patch
testskewedjoin.patch
skewedjoin.patch

Patches for lojoin and skewedjoin. This version of the patch has the root 
directory set to trunk as against pig that was uploaded earlier.

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: lojoin.patch, skewedjoin.patch, testskewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-02 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Attachment: (was: lojoin.patch)

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: lojoin.patch, skewedjoin.patch, testskewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-02 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Attachment: (was: testskewedjoin.patch)

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: skewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-02 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Attachment: (was: skewedjoin.patch)

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: skewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-02 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-792:


Status: Patch Available  (was: Open)

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: skewedjoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-06-23 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-792:


Attachment: lojoin.patch

Patch contains code for the logical operator LOJoin

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: lojoin.patch


 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-06-11 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-792:
---

Summary: PERFORMANCE: Support skewed join in pig  (was: Support skewed join 
in pig)

 PERFORMANCE: Support skewed join in pig
 ---

 Key: PIG-792
 URL: https://issues.apache.org/jira/browse/PIG-792
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath

 Fragmented replicated join has a few limitations:
  - One of the tables needs to be loaded into memory
  - Join is limited to two tables
 Skewed join partitions the table and joins the records in the reduce phase. 
 It computes a histogram of the key space to account for skewing in the input 
 records. Further, it adjusts the number of reducers depending on the key 
 distribution.
 We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.