[ 
https://issues.apache.org/jira/browse/PIG-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899526#action_12899526
 ] 

Thejas M Nair commented on PIG-1544:
------------------------------------

bq. We should not be using these bags for the cases like UDF for exactly the 
reason you are mentioning 
The case I had in mind was not one where UDF is creating proactive-spill bags, 
but case where udf input takes bags and they happen to be of proactive-spilling 
type and the udf retains bags from previous rows.

Anyway, I have come up with a more realistic(?) use case where it is difficult 
to determine the number of proactive-spill bags that will be present at run 
time -

{code}
L = load 'f1' as ( c1 : int, b1 : bag{ } );
F1 = foreach L { d = distinct b1; generate c1, d; }    -- InternalDistinctBag 
will be created here
G = group F by c1 using 'merge'; -- This group-by could [1] accumulate several 
of these   InternalDistinctBag objects
F2 = foreach G generate ...

[1] - This does not happen because the query plan has a 
PORelationToExpressionProject after the result from PODistinct which copies the 
bag. But it looks like we can optimize and get rid of that bag in this case.

{code}



> proactive-spill bags should share the memory alloted for it
> -----------------------------------------------------------
>
>                 Key: PIG-1544
>                 URL: https://issues.apache.org/jira/browse/PIG-1544
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Thejas M Nair
>
> Initially proactive spill bags were designed for use in (co)group 
> (InternalCacheBag) and they knew the total number of proactive bags that were 
> present, and shared the memory limit specified using the property 
> pig.cachedbag.memusage .
> But the two proactive bag implementations were added later - 
> InternalDistinctBag and InternalSortedBag are not aware of actual number of 
> bags being used - their users always assume total-numbags = 3. 
> This needs to be fixed and all proactive-spill bags should share the 
> memory-limit .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to