[ https://issues.apache.org/jira/browse/PIG-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich resolved PIG-629. -------------------------------- Resolution: Fixed > PERFORMANCE: Eliminate use of TargetedTuple for each input tuple in the map() > ----------------------------------------------------------------------------- > > Key: PIG-629 > URL: https://issues.apache.org/jira/browse/PIG-629 > Project: Pig > Issue Type: Improvement > Affects Versions: types_branch > Reporter: Pradeep Kamath > Assignee: Pradeep Kamath > Fix For: types_branch > > Attachments: PIG-629.patch > > > Currently each Tuple read in by Pig is wrapped into a TargetedTuple which has > an attribute holding a list of operator keys corresponding to the root > operators for which the tuple is targeted. For example in a cogroup query the > tuple would be destined for one of the two roots of the plan depending on > which input it is sourced from. This information is contained in the > TargetedTuple. However this adds unnecessary overhead at load time in a map > as for each tuple this extra list needs to be attached and also on entry into > the map(), the operators corresponding to the operator keys in the list need > to be looked up in the map plan. > This overhead can be eliminated by just serializing this list of target > operators at the Record Reader level and then deserializing the list in the > configure() of the map(). After deserialization, the actual operators > corresponding to the operator keys can also be looked up in the configure() > itself. This way this setup is done one time in the configure() rather than > adding extra overhead to each input tuple and each map() call. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.