PigCombine does not use configure method and therefore de-serialize and
instantiate objects with every reduce call
------------------------------------------------------------------------------------------------------------------
Key: PIG-108
URL: https://issues.apache.org/jira/browse/PIG-108
Project: Pig
Issue Type: Improvement
Components: impl
Affects Versions: 0.1.0
Reporter: Stefan Groschupf
Priority: Critical
Fix For: 0.1.0
There some significant space for improvement in the PigCombine.
In each reduce call some objects are deserialized from the jobConf and also the
object graph is generated again and again.
Hadoop garanties to call the configure method before a run through and things
like inputCount can be than cached as fields.
During reduce calls the jobConf will not change so re deserialization and
instantiation of all this objects
pigContext, evalPipe, inputCount, oc, finalout, esp and so on and so on, makes
no sense from my point of view.
Not sure how often the PigCombine is used, but it will significant improve
performance if we fix this.
Was there any reason to do things like this or is that just historical?
As soon the test suite is running again, I would be happy to work on a patch if
there is no other options about that.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.