[
https://issues.apache.org/jira/browse/PIG-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783414#action_12783414
]
Jeff Zhang commented on PIG-688:
--------------------------------
Thejas, do you mean to create an new MapRunnable class for Pig ?
The default MapRunnable implementation in hadoop MapRunner read records one by
one, so if we want to process multiple records between operators, we need to
create a new MapRunner for it.
> PERFORMANCE: Vectorize operators
> --------------------------------
>
> Key: PIG-688
> URL: https://issues.apache.org/jira/browse/PIG-688
> Project: Pig
> Issue Type: Improvement
> Reporter: Thejas M Nair
>
> By Vectorization, I mean passing multiple (/vector of) records at a time
> between operators (and potentially other functions like udfs)
> Vectorization of pig operators can improve performance by
> 1. improving locality and cache utilization
> 2. Reducing number of function calls. Many functions calls are likely to be
> dynamically resolved. There may be some checks in each function that we might
> be able to do once for several recs.
> 3. Potentially benefit from cpu pipeline architecture. ( But I don't know how
> good java VM is at that ..)
> To do vectorization in map stage, we need to use MapRunner - see PIG-687.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.