Hi,

On Tue, Aug 30, 2011 at 9:15 PM, Mark Kerzner <[email protected]> wrote:
> this happens when the eml files has attachments. As we know, Tika extracts
> text from the attachments (which is great, this is what I need), but it
> seems like it does not close those attachments, although it does delete
> them.

Yes, I think you're right. I believe the problem here is the
openContainer field within TikaInputStream where the container-aware
type detection code stores the already opened container (in this case
an NPOIFSFileSystem object) to avoid having to duplicate the parsing
work. Unfortunately there's no mechanism (except garbage collection by
the JVM) by which the container object gets properly disposed when
it's no longer needed, and I believe this is what's preventing the
underlying temporary files from getting reclaimed.

Perhaps we should extend the current TemporaryFiles mechanism to a
more generic TemporaryResources class that could also take care of
properly disposing also non-file resources associated with a
TikaInputStream instance.

BR,

Jukka Zitting

Reply via email to