PERFORMANCE: Eliminate use of TargetedTuple for each input tuple in the map()

                 Key: PIG-629
             Project: Pig
          Issue Type: Improvement
    Affects Versions: types_branch
            Reporter: Pradeep Kamath
            Assignee: Pradeep Kamath
             Fix For: types_branch

Currently each Tuple read in by Pig is wrapped into a TargetedTuple which has 
an attribute holding a list of operator keys corresponding to the root 
operators for which the tuple is targeted. For example in a cogroup query the 
tuple would be destined for one of the two roots of the plan depending on which 
input it is sourced from. This information is contained in the TargetedTuple. 
However this adds unnecessary overhead at load time in a map as for each tuple 
this extra list needs to be attached and also on entry into the map(), the 
operators corresponding to the operator keys in the list need to be looked up 
in the map plan.

This overhead can be eliminated by just serializing this list of target 
operators at the Record Reader level and then deserializing the list in the 
configure() of the map(). After deserialization, the actual operators 
corresponding to the operator keys can also be looked up in the configure() 
itself. This way this setup is done one time in the configure() rather than 
adding extra overhead to each input tuple and each map() call.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to