[ https://issues.apache.org/jira/browse/PIG-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900332#action_12900332 ]
Thejas M Nair commented on PIG-1447: ------------------------------------ bq. Did you see any perf improvement? No, the query is the same and the performance is the same, just that the number of records reported earlier were not correct. Infact there was also a mistake in the calculation, i have fixed that in updated patch for PIG-1524 . I made further modifications to the L15_modified.pig to use larger columns - L15_modified2.pig (attached). With this query the number of records dumped are 17.5 million with 0.1f and 20 million with 0.2f for pig.cachedbag.memusage . The records are also much larger in size . I see around 10% improvement with 0.2f . Considering the issue in PIG-1544 and that multi-query optimized queries can have large number of bags, I think it is safer to leave the value at 10% for now. We can add documentation on adjusting the value of this property so that users can adjust it if they see lot of records being proactive-spilled . We should revisit this once PIG-1544 is fixed. > Tune memory usage of InternalCachedBag > -------------------------------------- > > Key: PIG-1447 > URL: https://issues.apache.org/jira/browse/PIG-1447 > Project: Pig > Issue Type: Improvement > Components: impl > Affects Versions: 0.7.0 > Reporter: Daniel Dai > Assignee: Thejas M Nair > Fix For: 0.8.0 > > Attachments: L15_modified.pig, L15_modified2.pig > > > We need to find a better value for "pig.cachedbag.memusage". -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.