[
https://issues.apache.org/jira/browse/PIG-108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583315#action_12583315
]
Stefan Groschupf commented on PIG-108:
--------------------------------------
That surprise me, since de-serialize an Object from a String with each reduce
call should be an significant performance impact.
The pig context was de-serialized from an String with each reduce, since this
was done before the null check.
The old code did look like:
{{
public void reduce(WritableComparable key, Iterator values, OutputCollector
output, Reporter reporter)
throws IOException {
try {
PigContext pigContext = (PigContext)
ObjectSerializer.deserialize(job.get("pig.pigContext"));
if (evalPipe == null) {
}}
Thanks Alan for checking this in. I will try to spend some more time next weeks
to profile pig a little more.
> PigCombine does not use configure method and therefore de-serialize and
> instantiate objects with every reduce call
> ------------------------------------------------------------------------------------------------------------------
>
> Key: PIG-108
> URL: https://issues.apache.org/jira/browse/PIG-108
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Affects Versions: 0.1.0
> Reporter: Stefan Groschupf
> Priority: Critical
> Fix For: 0.1.0
>
> Attachments: PIG-108-r639015-v1.patch
>
>
> There some significant space for improvement in the PigCombine.
> In each reduce call some objects are deserialized from the jobConf and also
> the object graph is generated again and again.
> Hadoop garanties to call the configure method before a run through and things
> like inputCount can be than cached as fields.
> During reduce calls the jobConf will not change so re deserialization and
> instantiation of all this objects
> pigContext, evalPipe, inputCount, oc, finalout, esp and so on and so on,
> makes no sense from my point of view.
> Not sure how often the PigCombine is used, but it will significant improve
> performance if we fix this.
> Was there any reason to do things like this or is that just historical?
> As soon the test suite is running again, I would be happy to work on a patch
> if there is no other options about that.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.