[ 
https://issues.apache.org/jira/browse/PIG-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580852#action_12580852
 ] 

Benjamin Reed commented on PIG-164:
-----------------------------------

+1 excellent

> In scripts that create large groups pig runs out of memory
> ----------------------------------------------------------
>
>                 Key: PIG-164
>                 URL: https://issues.apache.org/jira/browse/PIG-164
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.0.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: PIG-164.patch
>
>
> Scripts that need to group large amounts of data, such as a group all with 
> 20m records, often die with errors indicating that no more memory can be 
> allocated.  PIG-40 addressed this somewhat, but not completely.  In fact, it 
> appears that in some situations it made it worse.  If a script creates many 
> data bags it can now run out of memory tracking all those data bags that it 
> may need to spill even if none of those bags gets very large.
> The issue is that the fix to PIG-40 introduced a memory manager that has a 
> LinkedList of WeakReferences that it uses to track these data bags.  When it 
> is told by the memory manager to dump memory, it walks this LinkedList, 
> cleaning any entries that have gone stale and dumping any that are still 
> valid.  The problem is that in a script that processes many rows, the 
> LinkedList itself grows very large, and becomes the cause of needing to dump 
> memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to