[ https://issues.apache.org/jira/browse/PIG-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783716#action_12783716 ]
Jeff Zhang commented on PIG-688: -------------------------------- Dmitriy, thank you for your explanation. So my understanding is that we only need to create a Tuple Buffer of size n in PigMapBase, and process the Tuple Buffer as a batch through Map Plan, and finally process the remaining tuples in the close() method > PERFORMANCE: Vectorize operators > -------------------------------- > > Key: PIG-688 > URL: https://issues.apache.org/jira/browse/PIG-688 > Project: Pig > Issue Type: Improvement > Reporter: Thejas M Nair > > By Vectorization, I mean passing multiple (/vector of) records at a time > between operators (and potentially other functions like udfs) > Vectorization of pig operators can improve performance by > 1. improving locality and cache utilization > 2. Reducing number of function calls. Many functions calls are likely to be > dynamically resolved. There may be some checks in each function that we might > be able to do once for several recs. > 3. Potentially benefit from cpu pipeline architecture. ( But I don't know how > good java VM is at that ..) > To do vectorization in map stage, we need to use MapRunner - see PIG-687. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.