Ying He commented on PIG-975:

Answer to Olga's questions:

1. The synchronization can be removed. 
2. Memory fraction is configurable. the property name is 
pig.cachedbag.memusage, default value is 0.5
3. The first 100 tuples are used to calculate tuple size in memory to determine 
how many tuples can fit into the configured memusage. It's not the number of 
tuples kept in memory

> Need a databag that does not register with SpillableMemoryManager and spill 
> data pro-actively
> ---------------------------------------------------------------------------------------------
>                 Key: PIG-975
>                 URL: https://issues.apache.org/jira/browse/PIG-975
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Ying He
>            Assignee: Ying He
>             Fix For: 0.2.0
>         Attachments: PIG-975.patch, PIG-975.patch2
> POPackage uses DefaultDataBag during reduce process to hold data. It is 
> registered with SpillableMemoryManager and prone to OutOfMemoryException.  
> It's better to pro-actively managers the usage of the memory. The bag fills 
> in memory to a specified amount, and dump the rest the disk.  The amount of 
> memory to hold tuples is configurable. This can avoid out of memory error.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to