[ https://issues.apache.org/jira/browse/PIG-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-1336: ---------------------------- Description: We serialize POStore too early in the JobControlCompiler. At that time, storeFunc have unconstraint link to other operator; in the worst case, it will chain the whole physical plan. Also, in multi-store case, POStore has link to its data source, which is not needed and will increase the footprint of serialized POStore. (was: Currently, if a pig script job contains multiple map-reduce jobs, each job will serialize all map-reduce job plan into JobConf. The reason is PhysicalOperator.inputs is built in Physical plan and in fact it chains all the jobs without regard to map-reduce boundary. Further, when we only want to serialize POStore, we serialize this whole plan again due to POStore.inputs. This should be fixed. ) Summary: Optimize POStore serialized into JobConf (was: Optimize content of mapPlan/reducePlan to be serialized into JobConf) > Optimize POStore serialized into JobConf > ---------------------------------------- > > Key: PIG-1336 > URL: https://issues.apache.org/jira/browse/PIG-1336 > Project: Pig > Issue Type: Improvement > Components: impl > Affects Versions: 0.7.0 > Reporter: Daniel Dai > Assignee: Daniel Dai > Attachments: PIG-1336-1.patch, PIG-1336-2.patch > > > We serialize POStore too early in the JobControlCompiler. At that time, > storeFunc have unconstraint link to other operator; in the worst case, it > will chain the whole physical plan. Also, in multi-store case, POStore has > link to its data source, which is not needed and will increase the footprint > of serialized POStore. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.