Hi, On Tue, Aug 30, 2011 at 9:15 PM, Mark Kerzner <[email protected]> wrote: > this happens when the eml files has attachments. As we know, Tika extracts > text from the attachments (which is great, this is what I need), but it > seems like it does not close those attachments, although it does delete > them.
Yes, I think you're right. I believe the problem here is the openContainer field within TikaInputStream where the container-aware type detection code stores the already opened container (in this case an NPOIFSFileSystem object) to avoid having to duplicate the parsing work. Unfortunately there's no mechanism (except garbage collection by the JVM) by which the container object gets properly disposed when it's no longer needed, and I believe this is what's preventing the underlying temporary files from getting reclaimed. Perhaps we should extend the current TemporaryFiles mechanism to a more generic TemporaryResources class that could also take care of properly disposing also non-file resources associated with a TikaInputStream instance. BR, Jukka Zitting
