[jira] Updated: (PIG-164) In scripts that create large groups pig runs out of memory

Alan Gates (JIRA) Thu, 20 Mar 2008 12:12:53 -0700

     [ 
https://issues.apache.org/jira/browse/PIG-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alan Gates updated PIG-164:
---------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Fix checked in.

> In scripts that create large groups pig runs out of memory
> ----------------------------------------------------------
>
>                 Key: PIG-164
>                 URL: https://issues.apache.org/jira/browse/PIG-164
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.0.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: PIG-164.patch
>
>
> Scripts that need to group large amounts of data, such as a group all with 
> 20m records, often die with errors indicating that no more memory can be 
> allocated.  PIG-40 addressed this somewhat, but not completely.  In fact, it 
> appears that in some situations it made it worse.  If a script creates many 
> data bags it can now run out of memory tracking all those data bags that it 
> may need to spill even if none of those bags gets very large.
> The issue is that the fix to PIG-40 introduced a memory manager that has a 
> LinkedList of WeakReferences that it uses to track these data bags.  When it 
> is told by the memory manager to dump memory, it walks this LinkedList, 
> cleaning any entries that have gone stale and dumping any that are still 
> valid.  The problem is that in a script that processes many rows, the 
> LinkedList itself grows very large, and becomes the cause of needing to dump 
> memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-164) In scripts that create large groups pig runs out of memory

Reply via email to