[
https://issues.apache.org/jira/browse/PIG-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Gates updated PIG-7:
-------------------------
Attachment: combiner3.patch
The patch combiner3.patch addresses Utkarsh's points that the previous code
wasn't handling the case where there was a func(func()) in the projection. It
also wasn't handling the case where the projection was anything other than:
group, func(), [func()...]. Both of those are explicitly caught now.
One note is that the use of the combiner in this patch is fairly restrictive.
The user has to have a projection with the group in the position 0. We should
probably rework this so that the group can either be omitted or moved around.
I don't have time to do this now, but it shouldn't be too much work and it will
make using the code more flexible.
> Optimize execution of algebraic functions
> -----------------------------------------
>
> Key: PIG-7
> URL: https://issues.apache.org/jira/browse/PIG-7
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Reporter: Olga Natkovich
> Assignee: Alan Gates
> Attachments: combiner.patch, combiner2.patch, combiner3.patch
>
>
> Algebraic are functions that can be computed incrementally like count(X),
> SUM(X), etc. They can be computed effciently by doing the first level
> computation using hadoop combiner. This can give a significant (2-3x) speedup
> for many aggregation queries.
> Several users asked us for this feature so it is pretty high priority.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.