[ 
https://issues.apache.org/jira/browse/PIG-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-331:
---------------------------

    Attachment: combiner.patch

A first pass at using the combiner.  This patch contains a couple of things:

1) Foreachs that have only simple projection and algebraic functions make use 
of the combiner.
2) Distincts were optimized to not carry the bags of data along.  There is no 
actual need for them to use the combiner because we're only passing they keys.  
hadoop takes care of collecting the keys and not passing multiple instances 
from map to reduce.

Things this patch does not contain:

1) Foreachs that have a combination of algebraic and non-algebraic.  These 
should be do-able, with the non-algebraic functions just being replaced by 
simple projections of the appropriate fields in the combiner plan.

2) Foreachs that include inner plans.  This is in particular useful for inner 
plans that use distinct, as this is the way to emulate count(distinct x) from 
SQL, a fairly common operation.  Some inner plans (such as those containing 
filters) cannot be split.  This is a little more challenging because it 
requiring duplicating parts of the inner plan in the combiner and reducer and 
not duplicating other parts.

These two should be added later. 

> Combiner needs to be used in the types branch
> ---------------------------------------------
>
>                 Key: PIG-331
>                 URL: https://issues.apache.org/jira/browse/PIG-331
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Critical
>             Fix For: types_branch
>
>         Attachments: combiner.patch
>
>
> The initial implementation in the types branch does not make use of the 
> combiner.  For performance, it needs to make use of the combiner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to