[ https://issues.apache.org/jira/browse/PIG-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892551#action_12892551 ]
Thejas M Nair commented on PIG-1519: ------------------------------------ As part of these changes, we should consider keeping a (weak?) reference in the bags to all the iterators that have been created and call clear() (a new method in iterator impl class) that closes the DataInputStreams and invalidates the iterators. > Stop relying on finalize() to delete files, close filehandles in bag > implementations > ------------------------------------------------------------------------------------ > > Key: PIG-1519 > URL: https://issues.apache.org/jira/browse/PIG-1519 > Project: Pig > Issue Type: Improvement > Affects Versions: 0.8.0 > Reporter: Thejas M Nair > Priority: Minor > > In DefaultAbstractBag and its subclasses, the files used for spilling to disk > are deleted using finalize() . > The iterators associated with these bags use DataInputStreams but don't call > close on them, and the underlying FileInputStream.close() is called only > through FileInputStream.finalize(). > The use of finalize has performance implications and also makes it hard to > predict when the resources will get freed. > WeakReferences can be used to avoid the use of finalize(). See > http://java.sun.com/developer/technicalArticles/javase/finalization/ (look > for "An Alternative to Finalization") . > I have marked the priority has minor because the allocation of these > resources objects that have finalize happens only for large bags that spill > to disk (see related jira - PIG-1516), so the performance impact of the use > of finalize is not likely to be significant. Also, I haven't come across any > case where we have run out of these resources because finalize() thread has > not freed them yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.