Jukka,

as a user of Tika, I would welcome this enhancement. Actually, the files are
being deleted, it's just the the file handles in the java code are not being
closed.

For the time being, is there a workaround that I could use? Right now, this
is a show-stopper for my application (open source eDiscovery -
FreeEed<http://freeeed.org/>
).

Thank you,
Mark

On Tue, Aug 30, 2011 at 4:19 PM, Jukka Zitting <[email protected]>wrote:

> Hi,
>
> On Tue, Aug 30, 2011 at 9:15 PM, Mark Kerzner <[email protected]>
> wrote:
> > this happens when the eml files has attachments. As we know, Tika
> extracts
> > text from the attachments (which is great, this is what I need), but it
> > seems like it does not close those attachments, although it does delete
> > them.
>
> Yes, I think you're right. I believe the problem here is the
> openContainer field within TikaInputStream where the container-aware
> type detection code stores the already opened container (in this case
> an NPOIFSFileSystem object) to avoid having to duplicate the parsing
> work. Unfortunately there's no mechanism (except garbage collection by
> the JVM) by which the container object gets properly disposed when
> it's no longer needed, and I believe this is what's preventing the
> underlying temporary files from getting reclaimed.
>
> Perhaps we should extend the current TemporaryFiles mechanism to a
> more generic TemporaryResources class that could also take care of
> properly disposing also non-file resources associated with a
> TikaInputStream instance.
>
> BR,
>
> Jukka Zitting
>

Reply via email to