[
https://issues.apache.org/jira/browse/PIG-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900939#action_12900939
]
Thejas M Nair commented on PIG-1447:
------------------------------------
Some more reasons why higher value would still be safe -
1. A lot of the memory attributed to the InternalDistinct/InternalSorted bags
used from within nested-foreach will be shared with the InternalCacheBag in the
input tuple because the pig does not create a copy of the column objects.
2. In a nested foreach, at a time only one inner-plan will hold references to
the Internal* bags . The internal* bags are eventually converted to
DefaultDataBag by RelationToExpressionProject in these plans. In most common
cases (say you are generating multiple-count distincts, order-bys on bags in
nested foreach), that means only one Internal* bag created within nested
foreach will be referenced at a time. I tried comparing the memory footprint
with different number of distinct operations in a nested-foreach, and found
them to be in same range.
I am planning to set the default at 20% for now. If we find the memory limits
being hit as a result of this during the beta testing period, we can reduce the
default.
> Tune memory usage of InternalCachedBag
> --------------------------------------
>
> Key: PIG-1447
> URL: https://issues.apache.org/jira/browse/PIG-1447
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Affects Versions: 0.7.0
> Reporter: Daniel Dai
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: L15_modified.pig, L15_modified2.pig, PIG-1447.1.patch
>
>
> We need to find a better value for "pig.cachedbag.memusage".
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.