[ 
https://issues.apache.org/jira/browse/PIG-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614041#action_12614041
 ] 

Olga Natkovich commented on PIG-318:
------------------------------------

Another idea that we had is to basically implement our chains as Map->Map->Map 
jobs and get rid of reducers alltogether.

> Pipeline optimization for multiple reduces
> ------------------------------------------
>
>                 Key: PIG-318
>                 URL: https://issues.apache.org/jira/browse/PIG-318
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>
> Any time we chain together M-R jobs, we doing it because we need separate 
> reducer like with group by followed by order by. We don't really need the 
> maps. The ideal graph for us would be:
>  
> M->R->SortShuffle->R->SortShuffle->R ...
>  
> This would allow us to save read from DFs and write to the local disk which 
> could be fairly significant.
> Aparently this similar discussion took place on hadoop mailing list several 
> times and this request was turned down. Main reason is that in their opinion 
> cost of implementing something like that would outweigh the benefit. 
> To make a persuasive case, we need to measure the overhead of the empty maps 
> for "typical" queries.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to