[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-10-05 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-975:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed. Thanks, Ying

> Need a databag that does not register with SpillableMemoryManager and spill 
> data pro-actively
> -
>
> Key: PIG-975
> URL: https://issues.apache.org/jira/browse/PIG-975
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Ying He
>Assignee: Ying He
> Fix For: 0.6.0
>
> Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, 
> PIG-975.patch3, PIG-975.patch4
>
>
> POPackage uses DefaultDataBag during reduce process to hold data. It is 
> registered with SpillableMemoryManager and prone to OutOfMemoryException.  
> It's better to pro-actively managers the usage of the memory. The bag fills 
> in memory to a specified amount, and dump the rest the disk.  The amount of 
> memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-28 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-975:
---

Status: Open  (was: Patch Available)

> Need a databag that does not register with SpillableMemoryManager and spill 
> data pro-actively
> -
>
> Key: PIG-975
> URL: https://issues.apache.org/jira/browse/PIG-975
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Ying He
>Assignee: Ying He
> Fix For: 0.2.0
>
> Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, 
> PIG-975.patch3, PIG-975.patch4
>
>
> POPackage uses DefaultDataBag during reduce process to hold data. It is 
> registered with SpillableMemoryManager and prone to OutOfMemoryException.  
> It's better to pro-actively managers the usage of the memory. The bag fills 
> in memory to a specified amount, and dump the rest the disk.  The amount of 
> memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-28 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-975:
---

Fix Version/s: (was: 0.2.0)
   0.6.0
Affects Version/s: (was: 0.2.0)
   0.4.0
   Status: Patch Available  (was: Open)

> Need a databag that does not register with SpillableMemoryManager and spill 
> data pro-actively
> -
>
> Key: PIG-975
> URL: https://issues.apache.org/jira/browse/PIG-975
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Ying He
>Assignee: Ying He
> Fix For: 0.6.0
>
> Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, 
> PIG-975.patch3, PIG-975.patch4
>
>
> POPackage uses DefaultDataBag during reduce process to hold data. It is 
> registered with SpillableMemoryManager and prone to OutOfMemoryException.  
> It's better to pro-actively managers the usage of the memory. The bag fills 
> in memory to a specified amount, and dump the rest the disk.  The amount of 
> memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-25 Thread Ying He (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying He updated PIG-975:


Attachment: PIG-975.patch4

Add switch to old bag.  Setting property pig.cachedbag.type=default  would 
switch to old default bag. If not specified, use InternalCachedBag.l

> Need a databag that does not register with SpillableMemoryManager and spill 
> data pro-actively
> -
>
> Key: PIG-975
> URL: https://issues.apache.org/jira/browse/PIG-975
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Ying He
>Assignee: Ying He
> Fix For: 0.2.0
>
> Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, 
> PIG-975.patch3, PIG-975.patch4
>
>
> POPackage uses DefaultDataBag during reduce process to hold data. It is 
> registered with SpillableMemoryManager and prone to OutOfMemoryException.  
> It's better to pro-actively managers the usage of the memory. The bag fills 
> in memory to a specified amount, and dump the rest the disk.  The amount of 
> memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-25 Thread Ying He (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying He updated PIG-975:


Attachment: internalbag.xls

performance numbers 

> Need a databag that does not register with SpillableMemoryManager and spill 
> data pro-actively
> -
>
> Key: PIG-975
> URL: https://issues.apache.org/jira/browse/PIG-975
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Ying He
>Assignee: Ying He
> Fix For: 0.2.0
>
> Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, 
> PIG-975.patch3
>
>
> POPackage uses DefaultDataBag during reduce process to hold data. It is 
> registered with SpillableMemoryManager and prone to OutOfMemoryException.  
> It's better to pro-actively managers the usage of the memory. The bag fills 
> in memory to a specified amount, and dump the rest the disk.  The amount of 
> memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-25 Thread Ying He (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying He updated PIG-975:


Attachment: PIG-975.patch3

remove synchronization

> Need a databag that does not register with SpillableMemoryManager and spill 
> data pro-actively
> -
>
> Key: PIG-975
> URL: https://issues.apache.org/jira/browse/PIG-975
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Ying He
>Assignee: Ying He
> Fix For: 0.2.0
>
> Attachments: PIG-975.patch, PIG-975.patch2, PIG-975.patch3
>
>
> POPackage uses DefaultDataBag during reduce process to hold data. It is 
> registered with SpillableMemoryManager and prone to OutOfMemoryException.  
> It's better to pro-actively managers the usage of the memory. The bag fills 
> in memory to a specified amount, and dump the rest the disk.  The amount of 
> memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-24 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-975:
---

Status: Patch Available  (was: Open)

> Need a databag that does not register with SpillableMemoryManager and spill 
> data pro-actively
> -
>
> Key: PIG-975
> URL: https://issues.apache.org/jira/browse/PIG-975
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Ying He
>Assignee: Ying He
> Fix For: 0.2.0
>
> Attachments: PIG-975.patch, PIG-975.patch2
>
>
> POPackage uses DefaultDataBag during reduce process to hold data. It is 
> registered with SpillableMemoryManager and prone to OutOfMemoryException.  
> It's better to pro-actively managers the usage of the memory. The bag fills 
> in memory to a specified amount, and dump the rest the disk.  The amount of 
> memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-24 Thread Ying He (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying He updated PIG-975:


Attachment: PIG-975.patch2

remove System.out.println

> Need a databag that does not register with SpillableMemoryManager and spill 
> data pro-actively
> -
>
> Key: PIG-975
> URL: https://issues.apache.org/jira/browse/PIG-975
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Ying He
>Assignee: Pradeep Kamath
> Fix For: 0.2.0
>
> Attachments: PIG-975.patch, PIG-975.patch2
>
>
> POPackage uses DefaultDataBag during reduce process to hold data. It is 
> registered with SpillableMemoryManager and prone to OutOfMemoryException.  
> It's better to pro-actively managers the usage of the memory. The bag fills 
> in memory to a specified amount, and dump the rest the disk.  The amount of 
> memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-24 Thread Ying He (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying He updated PIG-975:


Attachment: PIG-975.patch

implement a new bag and use it in POPackage

> Need a databag that does not register with SpillableMemoryManager and spill 
> data pro-actively
> -
>
> Key: PIG-975
> URL: https://issues.apache.org/jira/browse/PIG-975
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Ying He
>Assignee: Pradeep Kamath
> Fix For: 0.2.0
>
> Attachments: PIG-975.patch
>
>
> POPackage uses DefaultDataBag during reduce process to hold data. It is 
> registered with SpillableMemoryManager and prone to OutOfMemoryException.  
> It's better to pro-actively managers the usage of the memory. The bag fills 
> in memory to a specified amount, and dump the rest the disk.  The amount of 
> memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-24 Thread Ying He (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying He updated PIG-975:


Description: POPackage uses DefaultDataBag during reduce process to hold 
data. It is registered with SpillableMemoryManager and prone to 
OutOfMemoryException.  It's better to pro-actively managers the usage of the 
memory. The bag fills in memory to a specified amount, and dump the rest the 
disk.  The amount of memory to hold tuples is configurable. This can avoid out 
of memory error.  (was: Currently whenever Combiner is used in pig, in the map, 
the POPrecombinerLocalRearrange operator puts the single "value" tuple 
corresponding to a key into a DataBag and passes this to the foreach which is 
being combined. This will generate as many bags as there are input records. 
These bags all will have a single tuple and hence are small and should not need 
to be spilt to disk. However since the bags are created through the BagFactory 
mechanism, each bag creation is registered with the SpillableMemoryManager and 
a weak reference to the bag is stored in a linked list. This linked list grows 
really big over time causing unnecessary Garbage collection runs. This can be 
avoided by having a simple lightweight implementation of the DataBag interface 
to store the single tuple in a bag. Also these SingleTupleBags should be 
created without registering with the spillableMemoryManager. Likewise the bags 
created in POCombinePackage are supposed to fit in Memory and not spill. Again 
a NonSpillableDataBag implementation of DataBag interface which does not 
register with the SpillableMemoryManager would help.
)

> Need a databag that does not register with SpillableMemoryManager and spill 
> data pro-actively
> -
>
> Key: PIG-975
> URL: https://issues.apache.org/jira/browse/PIG-975
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Ying He
>Assignee: Pradeep Kamath
> Fix For: 0.2.0
>
>
> POPackage uses DefaultDataBag during reduce process to hold data. It is 
> registered with SpillableMemoryManager and prone to OutOfMemoryException.  
> It's better to pro-actively managers the usage of the memory. The bag fills 
> in memory to a specified amount, and dump the rest the disk.  The amount of 
> memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.