Hi,

On Tue, Aug 30, 2011 at 11:19 PM, Jukka Zitting <[email protected]> wrote:
> Yes, I think you're right. I believe the problem here is the
> openContainer field within TikaInputStream where the container-aware
> type detection code stores the already opened container (in this case
> an NPOIFSFileSystem object) to avoid having to duplicate the parsing
> work. Unfortunately there's no mechanism (except garbage collection by
> the JVM) by which the container object gets properly disposed when
> it's no longer needed, and I believe this is what's preventing the
> underlying temporary files from getting reclaimed.

Actually the problem was much simpler than that. Code within the
ParserContainerExtractor class creates a new TikaInputStream for
processing an embedded resource, but then never closes that stream.
This prevents the temporary file behind that stream from being removed
on Windows.

See the attached patch for a quick draft of a fix.

BR,

Jukka Zitting
From ab24238e91f038bdb579e7a2f38aecdde1263787 Mon Sep 17 00:00:00 2001
From: Jukka Zitting <[email protected]>
Date: Wed, 31 Aug 2011 18:37:56 +0200
Subject: [PATCH] Properly close the temporary TikaInputStream created by the
 ParserContainerExtractor class

---
 .../tika/extractor/ParserContainerExtractor.java   |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tika-core/src/main/java/org/apache/tika/extractor/ParserContainerExtractor.java b/tika-core/src/main/java/org/apache/tika/extractor/ParserContainerExtractor.java
index 6091ae7..24f0d14 100644
--- a/tika-core/src/main/java/org/apache/tika/extractor/ParserContainerExtractor.java
+++ b/tika-core/src/main/java/org/apache/tika/extractor/ParserContainerExtractor.java
@@ -121,8 +121,13 @@ public class ParserContainerExtractor implements ContainerExtractor {
                     // Use a temporary file to process the stream twice
                     File file = tis.getFile();
 
-                    // Let the handler process the embedded resource 
-                    handler.handle(filename, type, TikaInputStream.get(file));
+                    // Let the handler process the embedded resource
+                    InputStream input = TikaInputStream.get(file);
+                    try {
+                        handler.handle(filename, type, input);
+                    } finally {
+                        input.close();
+                    }
 
                     // Recurse
                     extractor.extract(tis, extractor, handler);
-- 
1.7.4.4

Reply via email to