[ 
https://issues.apache.org/jira/browse/PIG-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583098#action_12583098
 ] 

Alan Gates commented on PIG-170:
--------------------------------

Pi is correct that there are scenarios where a big bag could be sorted ahead of 
smaller bags that will empty faster.  But to get into this circumstance, a very 
specific set of conditions have to occur:

1) many small bags are created
2) small bags are moved into large bag, without yet being released
3) spill happens, forcing a sort of the linked list
4) small bags go out of scope

This set of events seems fairly rare.  And even when it does occur, the worst 
that happens is we are not as aggressive as we could be about cleaning the 
list.  In the very worst case it will cause an early spill.

We cannot clean the entire list on every register call, as that is far too 
expensive (I tried it, it slowed performance by an order of magnitude on large 
scripts).  We want to spill large bags first so that we spill as few bags as 
possible.  We could change the code to copy the list and sort that copy, thus 
avoiding reordering the existing list.  However, once we're in the spill code, 
we are in a low memory situation.  Copying a potentially large list to sort it 
is a bad idea in that case.  So I don't see a better solution.

> Memory manager spills bags in the wrong order
> ---------------------------------------------
>
>                 Key: PIG-170
>                 URL: https://issues.apache.org/jira/browse/PIG-170
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Amir Youssefi
>         Attachments: compareMemUsage.gif, PIG-170_0_20080327.patch
>
>
> For optimal performance, we want to spill the largest bags first. This is not 
> what is happening right now and could be causing some of our memory issues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to