PigCombine does not use configure method and therefore de-serialize and 
instantiate objects with every reduce call
------------------------------------------------------------------------------------------------------------------

                 Key: PIG-108
                 URL: https://issues.apache.org/jira/browse/PIG-108
             Project: Pig
          Issue Type: Improvement
          Components: impl
    Affects Versions: 0.1.0
            Reporter: Stefan Groschupf
            Priority: Critical
             Fix For: 0.1.0


There some significant space for improvement in the PigCombine. 
In each reduce call some objects are deserialized from the jobConf and also the 
object graph is generated again and again. 
Hadoop garanties to call the configure method before a run through and things 
like inputCount can be than cached as fields. 
During reduce calls the jobConf will not change so re deserialization and 
instantiation of all this objects 

pigContext, evalPipe, inputCount, oc, finalout, esp and so on and so on, makes 
no sense from my point of view.

Not sure how often the PigCombine is used, but it will significant improve 
performance if we fix this.
Was there any reason to do things like this or is that just historical? 
As soon the test suite is running again, I would be happy to work on a patch if 
there is no other options about that. 




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to