[ https://issues.apache.org/jira/browse/PIG-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899526#action_12899526 ]
Thejas M Nair commented on PIG-1544: ------------------------------------ bq. We should not be using these bags for the cases like UDF for exactly the reason you are mentioning The case I had in mind was not one where UDF is creating proactive-spill bags, but case where udf input takes bags and they happen to be of proactive-spilling type and the udf retains bags from previous rows. Anyway, I have come up with a more realistic(?) use case where it is difficult to determine the number of proactive-spill bags that will be present at run time - {code} L = load 'f1' as ( c1 : int, b1 : bag{ } ); F1 = foreach L { d = distinct b1; generate c1, d; } -- InternalDistinctBag will be created here G = group F by c1 using 'merge'; -- This group-by could [1] accumulate several of these InternalDistinctBag objects F2 = foreach G generate ... [1] - This does not happen because the query plan has a PORelationToExpressionProject after the result from PODistinct which copies the bag. But it looks like we can optimize and get rid of that bag in this case. {code} > proactive-spill bags should share the memory alloted for it > ----------------------------------------------------------- > > Key: PIG-1544 > URL: https://issues.apache.org/jira/browse/PIG-1544 > Project: Pig > Issue Type: Bug > Reporter: Thejas M Nair > > Initially proactive spill bags were designed for use in (co)group > (InternalCacheBag) and they knew the total number of proactive bags that were > present, and shared the memory limit specified using the property > pig.cachedbag.memusage . > But the two proactive bag implementations were added later - > InternalDistinctBag and InternalSortedBag are not aware of actual number of > bags being used - their users always assume total-numbags = 3. > This needs to be fixed and all proactive-spill bags should share the > memory-limit . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.