[
https://issues.apache.org/jira/browse/PIG-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899526#action_12899526
]
Thejas M Nair commented on PIG-1544:
------------------------------------
bq. We should not be using these bags for the cases like UDF for exactly the
reason you are mentioning
The case I had in mind was not one where UDF is creating proactive-spill bags,
but case where udf input takes bags and they happen to be of proactive-spilling
type and the udf retains bags from previous rows.
Anyway, I have come up with a more realistic(?) use case where it is difficult
to determine the number of proactive-spill bags that will be present at run
time -
{code}
L = load 'f1' as ( c1 : int, b1 : bag{ } );
F1 = foreach L { d = distinct b1; generate c1, d; } -- InternalDistinctBag
will be created here
G = group F by c1 using 'merge'; -- This group-by could [1] accumulate several
of these InternalDistinctBag objects
F2 = foreach G generate ...
[1] - This does not happen because the query plan has a
PORelationToExpressionProject after the result from PODistinct which copies the
bag. But it looks like we can optimize and get rid of that bag in this case.
{code}
> proactive-spill bags should share the memory alloted for it
> -----------------------------------------------------------
>
> Key: PIG-1544
> URL: https://issues.apache.org/jira/browse/PIG-1544
> Project: Pig
> Issue Type: Bug
> Reporter: Thejas M Nair
>
> Initially proactive spill bags were designed for use in (co)group
> (InternalCacheBag) and they knew the total number of proactive bags that were
> present, and shared the memory limit specified using the property
> pig.cachedbag.memusage .
> But the two proactive bag implementations were added later -
> InternalDistinctBag and InternalSortedBag are not aware of actual number of
> bags being used - their users always assume total-numbags = 3.
> This needs to be fixed and all proactive-spill bags should share the
> memory-limit .
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.