[ https://issues.apache.org/jira/browse/PIG-108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates reassigned PIG-108: ------------------------------ Assignee: Stefan Groschupf > PigCombine does not use configure method and therefore de-serialize and > instantiate objects with every reduce call > ------------------------------------------------------------------------------------------------------------------ > > Key: PIG-108 > URL: https://issues.apache.org/jira/browse/PIG-108 > Project: Pig > Issue Type: Improvement > Components: impl > Affects Versions: 0.1.0 > Reporter: Stefan Groschupf > Assignee: Stefan Groschupf > Priority: Critical > Fix For: 0.1.0 > > Attachments: PIG-108-r639015-v1.patch > > > There some significant space for improvement in the PigCombine. > In each reduce call some objects are deserialized from the jobConf and also > the object graph is generated again and again. > Hadoop garanties to call the configure method before a run through and things > like inputCount can be than cached as fields. > During reduce calls the jobConf will not change so re deserialization and > instantiation of all this objects > pigContext, evalPipe, inputCount, oc, finalout, esp and so on and so on, > makes no sense from my point of view. > Not sure how often the PigCombine is used, but it will significant improve > performance if we fix this. > Was there any reason to do things like this or is that just historical? > As soon the test suite is running again, I would be happy to work on a patch > if there is no other options about that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.