[jira] Updated: (PIG-951) Reset parallelism to 1 for indexing job in MergeJoin

2009-09-18 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-951:
---

   Resolution: Fixed
Fix Version/s: 0.6.0
   Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks Ashutosh.

 Reset parallelism to 1 for indexing job in MergeJoin
 

 Key: PIG-951
 URL: https://issues.apache.org/jira/browse/PIG-951
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.6.0

 Attachments: pig-951.patch


 After sampling one tuple from every block, one reducer is used to sort the 
 index entries in reduce phase to produce sorted index to be used in actual 
 join job. Thus, parallelism of index job should be explictly set to 1. 
 Currently, its not.
 Currently, this is a non-issue, since we don't allow any blocking operators 
 in pipeline before merge-join. However, later when we do allow blocking 
 operators, then parallelism of indexing job will be that of preceding 
 blocking operator. Even then, job will complete successfully because all 
 tuple will go to only one reducer, because we are grouping on only one key 
 all. However, it will waste cluster resources by starting all the extra 
 reducers which get no data and thus do nothing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-951) Reset parallelism to 1 for indexing job in MergeJoin

2009-09-10 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-951:
---

Status: Patch Available  (was: Open)

 Reset parallelism to 1 for indexing job in MergeJoin
 

 Key: PIG-951
 URL: https://issues.apache.org/jira/browse/PIG-951
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: pig-951.patch


 After sampling one tuple from every block, one reducer is used to sort the 
 index entries in reduce phase to produce sorted index to be used in actual 
 join job. Thus, parallelism of index job should be explictly set to 1. 
 Currently, its not.
 Currently, this is a non-issue, since we don't allow any blocking operators 
 in pipeline before merge-join. However, later when we do allow blocking 
 operators, then parallelism of indexing job will be that of preceding 
 blocking operator. Even then, job will complete successfully because all 
 tuple will go to only one reducer, because we are grouping on only one key 
 all. However, it will waste cluster resources by starting all the extra 
 reducers which get no data and thus do nothing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-951) Reset parallelism to 1 for indexing job in MergeJoin

2009-09-09 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-951:
-

Attachment: pig-951.patch

One line patch which fixes this. Also, added test case to catch regression on 
this.

 Reset parallelism to 1 for indexing job in MergeJoin
 

 Key: PIG-951
 URL: https://issues.apache.org/jira/browse/PIG-951
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: pig-951.patch


 After sampling one tuple from every block, one reducer is used to sort the 
 index entries in reduce phase to produce sorted index to be used in actual 
 join job. Thus, parallelism of index job should be explictly set to 1. 
 Currently, its not.
 Currently, this is a non-issue, since we don't allow any blocking operators 
 in pipeline before merge-join. However, later when we do allow blocking 
 operators, then parallelism of indexing job will be that of preceding 
 blocking operator. Even then, job will complete successfully because all 
 tuple will go to only one reducer, because we are grouping on only one key 
 all. However, it will waste cluster resources by starting all the extra 
 reducers which get no data and thus do nothing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.