[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760485#action_12760485
 ] 

Hadoop QA commented on PIG-975:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12420603/PIG-975.patch4
  against trunk revision 819691.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 406 javac compiler warnings (more 
than the trunk's current 403 warnings).

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

-1 release audit.  The applied patch generated 278 release audit warnings 
(more than the trunk's current 277 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/10/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/10/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/10/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/10/console

This message is automatically generated.

 Need a databag that does not register with SpillableMemoryManager and spill 
 data pro-actively
 -

 Key: PIG-975
 URL: https://issues.apache.org/jira/browse/PIG-975
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Ying He
Assignee: Ying He
 Fix For: 0.6.0

 Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, 
 PIG-975.patch3, PIG-975.patch4


 POPackage uses DefaultDataBag during reduce process to hold data. It is 
 registered with SpillableMemoryManager and prone to OutOfMemoryException.  
 It's better to pro-actively managers the usage of the memory. The bag fills 
 in memory to a specified amount, and dump the rest the disk.  The amount of 
 memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-25 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759645#action_12759645
 ] 

Pradeep Kamath commented on PIG-975:


I think it might be a good idea to have a config parameter (maybe a java -D 
property) which can allow users to choose between spillableBagForReduce and 
NonSpillableBagForReduce with the Non spillable one being the default. This way 
if for some reason users find the spillablebag better for their query they can 
use it.

 Need a databag that does not register with SpillableMemoryManager and spill 
 data pro-actively
 -

 Key: PIG-975
 URL: https://issues.apache.org/jira/browse/PIG-975
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Ying He
Assignee: Ying He
 Fix For: 0.2.0

 Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, 
 PIG-975.patch3


 POPackage uses DefaultDataBag during reduce process to hold data. It is 
 registered with SpillableMemoryManager and prone to OutOfMemoryException.  
 It's better to pro-actively managers the usage of the memory. The bag fills 
 in memory to a specified amount, and dump the rest the disk.  The amount of 
 memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-25 Thread Ying He (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759681#action_12759681
 ] 

Ying He commented on PIG-975:
-

I think this is too implementation specific to expose to end user. Frankly, I 
don't think user cares which class we use for the data bags. 

 Need a databag that does not register with SpillableMemoryManager and spill 
 data pro-actively
 -

 Key: PIG-975
 URL: https://issues.apache.org/jira/browse/PIG-975
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Ying He
Assignee: Ying He
 Fix For: 0.2.0

 Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, 
 PIG-975.patch3


 POPackage uses DefaultDataBag during reduce process to hold data. It is 
 registered with SpillableMemoryManager and prone to OutOfMemoryException.  
 It's better to pro-actively managers the usage of the memory. The bag fills 
 in memory to a specified amount, and dump the rest the disk.  The amount of 
 memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-24 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759284#action_12759284
 ] 

Olga Natkovich commented on PIG-975:


Couple of questions comments on the patch:

- Why do we need to synchronize in add. Who else is accessing the bag since it 
is no longer managed by spillable manager?
- Memory fraction should be a java property so that users can control it they 
choose so
- Why do we have limit of only 100 tuples in memory since we already have 
memory limit? Also, if we do need it, shouldn't it be configurable?

 Need a databag that does not register with SpillableMemoryManager and spill 
 data pro-actively
 -

 Key: PIG-975
 URL: https://issues.apache.org/jira/browse/PIG-975
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Ying He
Assignee: Ying He
 Fix For: 0.2.0

 Attachments: PIG-975.patch, PIG-975.patch2


 POPackage uses DefaultDataBag during reduce process to hold data. It is 
 registered with SpillableMemoryManager and prone to OutOfMemoryException.  
 It's better to pro-actively managers the usage of the memory. The bag fills 
 in memory to a specified amount, and dump the rest the disk.  The amount of 
 memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-24 Thread Ying He (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759299#action_12759299
 ] 

Ying He commented on PIG-975:
-

Answer to Olga's questions:

1. The synchronization can be removed. 
2. Memory fraction is configurable. the property name is 
pig.cachedbag.memusage, default value is 0.5
3. The first 100 tuples are used to calculate tuple size in memory to determine 
how many tuples can fit into the configured memusage. It's not the number of 
tuples kept in memory

 Need a databag that does not register with SpillableMemoryManager and spill 
 data pro-actively
 -

 Key: PIG-975
 URL: https://issues.apache.org/jira/browse/PIG-975
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Ying He
Assignee: Ying He
 Fix For: 0.2.0

 Attachments: PIG-975.patch, PIG-975.patch2


 POPackage uses DefaultDataBag during reduce process to hold data. It is 
 registered with SpillableMemoryManager and prone to OutOfMemoryException.  
 It's better to pro-actively managers the usage of the memory. The bag fills 
 in memory to a specified amount, and dump the rest the disk.  The amount of 
 memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-24 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759322#action_12759322
 ] 

Olga Natkovich commented on PIG-975:


Ok, then lets remove the synchronization. The rest looks good. Could you also 
put perf numbers that you see with this change

 Need a databag that does not register with SpillableMemoryManager and spill 
 data pro-actively
 -

 Key: PIG-975
 URL: https://issues.apache.org/jira/browse/PIG-975
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Ying He
Assignee: Ying He
 Fix For: 0.2.0

 Attachments: PIG-975.patch, PIG-975.patch2


 POPackage uses DefaultDataBag during reduce process to hold data. It is 
 registered with SpillableMemoryManager and prone to OutOfMemoryException.  
 It's better to pro-actively managers the usage of the memory. The bag fills 
 in memory to a specified amount, and dump the rest the disk.  The amount of 
 memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.