[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
[ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760485#action_12760485 ] Hadoop QA commented on PIG-975: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420603/PIG-975.patch4 against trunk revision 819691. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 406 javac compiler warnings (more than the trunk's current 403 warnings). -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. -1 release audit. The applied patch generated 278 release audit warnings (more than the trunk's current 277 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/10/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/10/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/10/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/10/console This message is automatically generated. Need a databag that does not register with SpillableMemoryManager and spill data pro-actively - Key: PIG-975 URL: https://issues.apache.org/jira/browse/PIG-975 Project: Pig Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Ying He Assignee: Ying He Fix For: 0.6.0 Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, PIG-975.patch3, PIG-975.patch4 POPackage uses DefaultDataBag during reduce process to hold data. It is registered with SpillableMemoryManager and prone to OutOfMemoryException. It's better to pro-actively managers the usage of the memory. The bag fills in memory to a specified amount, and dump the rest the disk. The amount of memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
[ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759645#action_12759645 ] Pradeep Kamath commented on PIG-975: I think it might be a good idea to have a config parameter (maybe a java -D property) which can allow users to choose between spillableBagForReduce and NonSpillableBagForReduce with the Non spillable one being the default. This way if for some reason users find the spillablebag better for their query they can use it. Need a databag that does not register with SpillableMemoryManager and spill data pro-actively - Key: PIG-975 URL: https://issues.apache.org/jira/browse/PIG-975 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Ying He Assignee: Ying He Fix For: 0.2.0 Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, PIG-975.patch3 POPackage uses DefaultDataBag during reduce process to hold data. It is registered with SpillableMemoryManager and prone to OutOfMemoryException. It's better to pro-actively managers the usage of the memory. The bag fills in memory to a specified amount, and dump the rest the disk. The amount of memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
[ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759681#action_12759681 ] Ying He commented on PIG-975: - I think this is too implementation specific to expose to end user. Frankly, I don't think user cares which class we use for the data bags. Need a databag that does not register with SpillableMemoryManager and spill data pro-actively - Key: PIG-975 URL: https://issues.apache.org/jira/browse/PIG-975 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Ying He Assignee: Ying He Fix For: 0.2.0 Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, PIG-975.patch3 POPackage uses DefaultDataBag during reduce process to hold data. It is registered with SpillableMemoryManager and prone to OutOfMemoryException. It's better to pro-actively managers the usage of the memory. The bag fills in memory to a specified amount, and dump the rest the disk. The amount of memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
[ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759284#action_12759284 ] Olga Natkovich commented on PIG-975: Couple of questions comments on the patch: - Why do we need to synchronize in add. Who else is accessing the bag since it is no longer managed by spillable manager? - Memory fraction should be a java property so that users can control it they choose so - Why do we have limit of only 100 tuples in memory since we already have memory limit? Also, if we do need it, shouldn't it be configurable? Need a databag that does not register with SpillableMemoryManager and spill data pro-actively - Key: PIG-975 URL: https://issues.apache.org/jira/browse/PIG-975 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Ying He Assignee: Ying He Fix For: 0.2.0 Attachments: PIG-975.patch, PIG-975.patch2 POPackage uses DefaultDataBag during reduce process to hold data. It is registered with SpillableMemoryManager and prone to OutOfMemoryException. It's better to pro-actively managers the usage of the memory. The bag fills in memory to a specified amount, and dump the rest the disk. The amount of memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
[ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759299#action_12759299 ] Ying He commented on PIG-975: - Answer to Olga's questions: 1. The synchronization can be removed. 2. Memory fraction is configurable. the property name is pig.cachedbag.memusage, default value is 0.5 3. The first 100 tuples are used to calculate tuple size in memory to determine how many tuples can fit into the configured memusage. It's not the number of tuples kept in memory Need a databag that does not register with SpillableMemoryManager and spill data pro-actively - Key: PIG-975 URL: https://issues.apache.org/jira/browse/PIG-975 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Ying He Assignee: Ying He Fix For: 0.2.0 Attachments: PIG-975.patch, PIG-975.patch2 POPackage uses DefaultDataBag during reduce process to hold data. It is registered with SpillableMemoryManager and prone to OutOfMemoryException. It's better to pro-actively managers the usage of the memory. The bag fills in memory to a specified amount, and dump the rest the disk. The amount of memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
[ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759322#action_12759322 ] Olga Natkovich commented on PIG-975: Ok, then lets remove the synchronization. The rest looks good. Could you also put perf numbers that you see with this change Need a databag that does not register with SpillableMemoryManager and spill data pro-actively - Key: PIG-975 URL: https://issues.apache.org/jira/browse/PIG-975 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Ying He Assignee: Ying He Fix For: 0.2.0 Attachments: PIG-975.patch, PIG-975.patch2 POPackage uses DefaultDataBag during reduce process to hold data. It is registered with SpillableMemoryManager and prone to OutOfMemoryException. It's better to pro-actively managers the usage of the memory. The bag fills in memory to a specified amount, and dump the rest the disk. The amount of memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.