[ 
https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887283#action_12887283
 ] 

Ashutosh Chauhan commented on PIG-1249:
---------------------------------------

Map-reduce framework has a jira related to this issue.  
https://issues.apache.org/jira/browse/MAPREDUCE-1521 It has two implications 
for Pig:

1) We need to reconsider whether we still want Pig to set number of reducers on 
user's behalf. We can choose not to "intelligently" choose # of reducers and 
let framework fail the  job which doesn't "correctly" specify # of reducers. 
Then, Pig is out of this guessing game and users are forced by framework to 
correctly specify # of reducers. 

2) Now that MR framework will fail the job based on configured limits, 
operators where Pig does compute and set number of reducers (like skewed join 
etc.) should now be aware of those limits so that # of reducers computed by 
them fall within those limits.

> Safe-guards against misconfigured Pig scripts without PARALLEL keyword
> ----------------------------------------------------------------------
>
>                 Key: PIG-1249
>                 URL: https://issues.apache.org/jira/browse/PIG-1249
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.8.0
>            Reporter: Arun C Murthy
>            Assignee: Jeff Zhang
>            Priority: Critical
>             Fix For: 0.8.0
>
>         Attachments: PIG-1249-4.patch, PIG-1249.patch, PIG_1249_2.patch, 
> PIG_1249_3.patch
>
>
> It would be *very* useful for Pig to have safe-guards against naive scripts 
> which process a *lot* of data without the use of PARALLEL keyword.
> We've seen a fair number of instances where naive users process huge 
> data-sets (>10TB) with badly mis-configured #reduces e.g. 1 reduce. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to