[
https://issues.apache.org/jira/browse/PIG-409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olga Natkovich updated PIG-409:
-------------------------------
Fix Version/s: types_branch
Affects Version/s: types_branch
> PERFORMANCE: Removing Union from map side of query with COGROUP
> ---------------------------------------------------------------
>
> Key: PIG-409
> URL: https://issues.apache.org/jira/browse/PIG-409
> Project: Pig
> Issue Type: Improvement
> Affects Versions: types_branch
> Reporter: Olga Natkovich
> Fix For: types_branch
>
>
> Currently, the map side code is not aware which side of the cogroup it is
> processing so it assumes that it processes all by putting a union at the end
> of the pipeline. This is fairly inefficient.
> A better approach would be to figure out which file is processed in confiugre
> call. There seems to be away to do this with hadoop but it is not documented
> so might not be guaranteed - need to follow up with somebody from hadoop
> project.
> Another approach is to check it the first time map is called and to pick the
> execution plan that matches that part.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.