[
https://issues.apache.org/jira/browse/PIG-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Gates updated PIG-164:
---------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
Fix checked in.
> In scripts that create large groups pig runs out of memory
> ----------------------------------------------------------
>
> Key: PIG-164
> URL: https://issues.apache.org/jira/browse/PIG-164
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.0.0
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: PIG-164.patch
>
>
> Scripts that need to group large amounts of data, such as a group all with
> 20m records, often die with errors indicating that no more memory can be
> allocated. PIG-40 addressed this somewhat, but not completely. In fact, it
> appears that in some situations it made it worse. If a script creates many
> data bags it can now run out of memory tracking all those data bags that it
> may need to spill even if none of those bags gets very large.
> The issue is that the fix to PIG-40 introduced a memory manager that has a
> LinkedList of WeakReferences that it uses to track these data bags. When it
> is told by the memory manager to dump memory, it walks this LinkedList,
> cleaning any entries that have gone stale and dumping any that are still
> valid. The problem is that in a script that processes many rows, the
> LinkedList itself grows very large, and becomes the cause of needing to dump
> memory.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.