[ 
https://issues.apache.org/jira/browse/PIG-108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-108:
------------------------------

    Assignee: Stefan Groschupf

> PigCombine does not use configure method and therefore de-serialize and 
> instantiate objects with every reduce call
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-108
>                 URL: https://issues.apache.org/jira/browse/PIG-108
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.1.0
>            Reporter: Stefan Groschupf
>            Assignee: Stefan Groschupf
>            Priority: Critical
>             Fix For: 0.1.0
>
>         Attachments: PIG-108-r639015-v1.patch
>
>
> There some significant space for improvement in the PigCombine. 
> In each reduce call some objects are deserialized from the jobConf and also 
> the object graph is generated again and again. 
> Hadoop garanties to call the configure method before a run through and things 
> like inputCount can be than cached as fields. 
> During reduce calls the jobConf will not change so re deserialization and 
> instantiation of all this objects 
> pigContext, evalPipe, inputCount, oc, finalout, esp and so on and so on, 
> makes no sense from my point of view.
> Not sure how often the PigCombine is used, but it will significant improve 
> performance if we fix this.
> Was there any reason to do things like this or is that just historical? 
> As soon the test suite is running again, I would be happy to work on a patch 
> if there is no other options about that. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to