Olga Natkovich commented on PIG-629:

patch committed, thanks pradeep

> PERFORMANCE: Eliminate use of TargetedTuple for each input tuple in the map()
> -----------------------------------------------------------------------------
>                 Key: PIG-629
>                 URL: https://issues.apache.org/jira/browse/PIG-629
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>         Attachments: PIG-629.patch
> Currently each Tuple read in by Pig is wrapped into a TargetedTuple which has 
> an attribute holding a list of operator keys corresponding to the root 
> operators for which the tuple is targeted. For example in a cogroup query the 
> tuple would be destined for one of the two roots of the plan depending on 
> which input it is sourced from. This information is contained in the 
> TargetedTuple. However this adds unnecessary overhead at load time in a map 
> as for each tuple this extra list needs to be attached and also on entry into 
> the map(), the operators corresponding to the operator keys in the list need 
> to be looked up in the map plan.
> This overhead can be eliminated by just serializing this list of target 
> operators at the Record Reader level and then deserializing the list in the 
> configure() of the map(). After deserialization, the actual operators 
> corresponding to the operator keys can also be looked up in the configure() 
> itself. This way this setup is done one time in the configure() rather than 
> adding extra overhead to each input tuple and each map() call.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to