[ https://issues.apache.org/jira/browse/PIG-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893589#action_12893589 ]
Hadoop QA commented on PIG-1516: -------------------------------- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450778/PIG-1516.2.patch against trunk revision 980276. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 402 release audit warnings (more than the trunk's current 400 warnings). -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/364/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/364/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/364/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/364/console This message is automatically generated. > finalize in bag implementations causes pig to run out of memory in reduce > -------------------------------------------------------------------------- > > Key: PIG-1516 > URL: https://issues.apache.org/jira/browse/PIG-1516 > Project: Pig > Issue Type: Bug > Affects Versions: 0.7.0 > Reporter: Thejas M Nair > Assignee: Thejas M Nair > Fix For: 0.8.0 > > Attachments: PIG-1516.2.patch, PIG-1516.patch > > > *Problem:* > pig bag implementations that are subclasses of DefaultAbstractBag, have > finalize methods implemented. As a result, the garbage collector moves them > to a finalization queue, and the memory used is freed only after the > finalization happens on it. > If the bags are not finalized fast enough, a lot of memory is consumed by the > finalization queue, and pig runs out of memory. This can happen if large > number of small bags are being created. > *Solution:* > The finalize function exists for the purpose of deleting the spill files that > are created when the bag is too large. But if the bags are small enough, no > spill files are created, and there is no use of the finalize function. > A new class that holds a list of files will be introduced (FileList). This > class will have a finalize method that deletes the files. The bags will no > longer have finalize methods, and the bags will use FileList instead of > ArrayList<File>. > *Possible workaround for earlier releases:* > Since the fix is going into 0.8, here is a workaround - > Disabling the combiner will reduce the number of bags getting created, as > there will not be the stage of combining intermediate merge results. But I > would recommend disabling it only if you have this problem as it is likely to > slow down the query . > To disable combiner, set the property: -Dpig.exec.nocombiner=true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.