[
https://issues.apache.org/jira/browse/PIG-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900440#action_12900440
]
Thejas M Nair commented on PIG-1544:
------------------------------------
bq. While computing the number of bags, we should remember to consider the
multi-query case as well.
In case of multi-query, the sub-plans for each query in multi-query are
executed one at a time for a given tuple with large bags. So the number of
large bags that can't be garbage collected would be similar to that of single
query.
Another thing to keep in mind is that multiple bags that are working on common
input (in case of distinct/order-by in nested foreach), would be sharing
some/most of the memory with the input bag because pig does not create copies
of the column objects.
> proactive-spill bags should share the memory alloted for it
> -----------------------------------------------------------
>
> Key: PIG-1544
> URL: https://issues.apache.org/jira/browse/PIG-1544
> Project: Pig
> Issue Type: Bug
> Reporter: Thejas M Nair
>
> Initially proactive spill bags were designed for use in (co)group
> (InternalCacheBag) and they knew the total number of proactive bags that were
> present, and shared the memory limit specified using the property
> pig.cachedbag.memusage .
> But the two proactive bag implementations were added later -
> InternalDistinctBag and InternalSortedBag are not aware of actual number of
> bags being used - their users always assume total-numbags = 3.
> This needs to be fixed and all proactive-spill bags should share the
> memory-limit .
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.