Jukka, as a user of Tika, I would welcome this enhancement. Actually, the files are being deleted, it's just the the file handles in the java code are not being closed.
For the time being, is there a workaround that I could use? Right now, this is a show-stopper for my application (open source eDiscovery - FreeEed<http://freeeed.org/> ). Thank you, Mark On Tue, Aug 30, 2011 at 4:19 PM, Jukka Zitting <[email protected]>wrote: > Hi, > > On Tue, Aug 30, 2011 at 9:15 PM, Mark Kerzner <[email protected]> > wrote: > > this happens when the eml files has attachments. As we know, Tika > extracts > > text from the attachments (which is great, this is what I need), but it > > seems like it does not close those attachments, although it does delete > > them. > > Yes, I think you're right. I believe the problem here is the > openContainer field within TikaInputStream where the container-aware > type detection code stores the already opened container (in this case > an NPOIFSFileSystem object) to avoid having to duplicate the parsing > work. Unfortunately there's no mechanism (except garbage collection by > the JVM) by which the container object gets properly disposed when > it's no longer needed, and I believe this is what's preventing the > underlying temporary files from getting reclaimed. > > Perhaps we should extend the current TemporaryFiles mechanism to a > more generic TemporaryResources class that could also take care of > properly disposing also non-file resources associated with a > TikaInputStream instance. > > BR, > > Jukka Zitting >
