[jira] Updated: (PIG-951) Reset parallelism to 1 for indexing job in MergeJoin
[ https://issues.apache.org/jira/browse/PIG-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-951: --- Resolution: Fixed Fix Version/s: 0.6.0 Status: Resolved (was: Patch Available) Patch checked in. Thanks Ashutosh. Reset parallelism to 1 for indexing job in MergeJoin Key: PIG-951 URL: https://issues.apache.org/jira/browse/PIG-951 Project: Pig Issue Type: Bug Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.6.0 Attachments: pig-951.patch After sampling one tuple from every block, one reducer is used to sort the index entries in reduce phase to produce sorted index to be used in actual join job. Thus, parallelism of index job should be explictly set to 1. Currently, its not. Currently, this is a non-issue, since we don't allow any blocking operators in pipeline before merge-join. However, later when we do allow blocking operators, then parallelism of indexing job will be that of preceding blocking operator. Even then, job will complete successfully because all tuple will go to only one reducer, because we are grouping on only one key all. However, it will waste cluster resources by starting all the extra reducers which get no data and thus do nothing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-951) Reset parallelism to 1 for indexing job in MergeJoin
[ https://issues.apache.org/jira/browse/PIG-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-951: --- Status: Patch Available (was: Open) Reset parallelism to 1 for indexing job in MergeJoin Key: PIG-951 URL: https://issues.apache.org/jira/browse/PIG-951 Project: Pig Issue Type: Bug Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: pig-951.patch After sampling one tuple from every block, one reducer is used to sort the index entries in reduce phase to produce sorted index to be used in actual join job. Thus, parallelism of index job should be explictly set to 1. Currently, its not. Currently, this is a non-issue, since we don't allow any blocking operators in pipeline before merge-join. However, later when we do allow blocking operators, then parallelism of indexing job will be that of preceding blocking operator. Even then, job will complete successfully because all tuple will go to only one reducer, because we are grouping on only one key all. However, it will waste cluster resources by starting all the extra reducers which get no data and thus do nothing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-951) Reset parallelism to 1 for indexing job in MergeJoin
[ https://issues.apache.org/jira/browse/PIG-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-951: - Attachment: pig-951.patch One line patch which fixes this. Also, added test case to catch regression on this. Reset parallelism to 1 for indexing job in MergeJoin Key: PIG-951 URL: https://issues.apache.org/jira/browse/PIG-951 Project: Pig Issue Type: Bug Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: pig-951.patch After sampling one tuple from every block, one reducer is used to sort the index entries in reduce phase to produce sorted index to be used in actual join job. Thus, parallelism of index job should be explictly set to 1. Currently, its not. Currently, this is a non-issue, since we don't allow any blocking operators in pipeline before merge-join. However, later when we do allow blocking operators, then parallelism of indexing job will be that of preceding blocking operator. Even then, job will complete successfully because all tuple will go to only one reducer, because we are grouping on only one key all. However, it will waste cluster resources by starting all the extra reducers which get no data and thus do nothing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.