In scripts that create large groups pig runs out of memory
----------------------------------------------------------

                 Key: PIG-164
                 URL: https://issues.apache.org/jira/browse/PIG-164
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.0.0
            Reporter: Alan Gates
            Assignee: Alan Gates


Scripts that need to group large amounts of data, such as a group all with 20m 
records, often die with errors indicating that no more memory can be allocated. 
 PIG-40 addressed this somewhat, but not completely.  In fact, it appears that 
in some situations it made it worse.  If a script creates many data bags it can 
now run out of memory tracking all those data bags that it may need to spill 
even if none of those bags gets very large.

The issue is that the fix to PIG-40 introduced a memory manager that has a 
LinkedList of WeakReferences that it uses to track these data bags.  When it is 
told by the memory manager to dump memory, it walks this LinkedList, cleaning 
any entries that have gone stale and dumping any that are still valid.  The 
problem is that in a script that processes many rows, the LinkedList itself 
grows very large, and becomes the cause of needing to dump memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to