[
https://issues.apache.org/jira/browse/PIG-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583098#action_12583098
]
Alan Gates commented on PIG-170:
--------------------------------
Pi is correct that there are scenarios where a big bag could be sorted ahead of
smaller bags that will empty faster. But to get into this circumstance, a very
specific set of conditions have to occur:
1) many small bags are created
2) small bags are moved into large bag, without yet being released
3) spill happens, forcing a sort of the linked list
4) small bags go out of scope
This set of events seems fairly rare. And even when it does occur, the worst
that happens is we are not as aggressive as we could be about cleaning the
list. In the very worst case it will cause an early spill.
We cannot clean the entire list on every register call, as that is far too
expensive (I tried it, it slowed performance by an order of magnitude on large
scripts). We want to spill large bags first so that we spill as few bags as
possible. We could change the code to copy the list and sort that copy, thus
avoiding reordering the existing list. However, once we're in the spill code,
we are in a low memory situation. Copying a potentially large list to sort it
is a bad idea in that case. So I don't see a better solution.
> Memory manager spills bags in the wrong order
> ---------------------------------------------
>
> Key: PIG-170
> URL: https://issues.apache.org/jira/browse/PIG-170
> Project: Pig
> Issue Type: Bug
> Reporter: Olga Natkovich
> Assignee: Amir Youssefi
> Attachments: compareMemUsage.gif, PIG-170_0_20080327.patch
>
>
> For optimal performance, we want to spill the largest bags first. This is not
> what is happening right now and could be causing some of our memory issues.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.