[jira] Commented: (PIG-688) PERFORMANCE: Vectorize operators

Jeff Zhang (JIRA) Mon, 30 Nov 2009 07:57:44 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783716#action_12783716
 ]


Jeff Zhang commented on PIG-688:
--------------------------------

Dmitriy, thank you for your explanation.  So my understanding is that we only 
need to create a Tuple Buffer of size n in PigMapBase, and process the Tuple 
Buffer as a batch through Map Plan, and finally process the remaining tuples in 
the close() method

> PERFORMANCE: Vectorize operators
> --------------------------------
>
>                 Key: PIG-688
>                 URL: https://issues.apache.org/jira/browse/PIG-688
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Thejas M Nair
>
> By Vectorization, I mean passing multiple (/vector of) records at a time 
> between operators (and potentially other functions like udfs)
> Vectorization of pig operators can improve performance by 
> 1. improving locality and cache utilization
> 2. Reducing number of function calls. Many functions calls are likely to be 
> dynamically resolved. There may be some checks in each function that we might 
> be able to do once for several recs.
> 3. Potentially benefit from cpu pipeline architecture. ( But I don't know how 
> good java VM is at that ..)
> To do vectorization in map stage, we need to use MapRunner - see PIG-687.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-688) PERFORMANCE: Vectorize operators

Reply via email to