I started trying to implement this, but obviously it's a bit more complex than a 10-minute job and requires knowledge of the design.
Mark On Tue, Aug 30, 2011 at 11:27 PM, Mark Kerzner <[email protected]>wrote: > Guys, > > the errors show up again. I already thanked > everybody<http://shmsoft.blogspot.com/2011/08/freeeed-processing-is-stable.html>! > I > wonder how I can make good on this :) > > I think that in ParserContainerExtractor.parse you need to associated > TikaInputStream with temporary files both ways: from the stream you already > can find the file, but you should be able to find the stream from the file. > Then, when you are deleting the file, you can also close the associated > stream. Something like > > public void parse( > InputStream stream, ContentHandler ignored, > Metadata metadata, ParseContext context) > throws IOException, SAXException, TikaException { > TemporaryFiles tmp = new TemporaryFiles(); > try { > TikaInputStream tis = TikaInputStream.get(stream, tmp); > > // Figure out what we have to process > String filename = metadata.get(Metadata.RESOURCE_NAME_KEY); > MediaType type = detector.detect(tis, metadata); > > if (extractor == null) { > // Let the handler process the embedded resource > handler.handle(filename, type, tis); > } else { > // Use a temporary file to process the stream twice > File file = tis.getFile(); > > // Let the handler process the embedded resource > handler.handle(filename, type, > TikaInputStream.get(file)); > > // Recurse > extractor.extract(tis, extractor, handler); > } > } finally { > tmp.closeStreams(); > tmp.dispose(); > > } > } > > Thank you, > Mark > > > On Tue, Aug 30, 2011 at 9:43 PM, Mark Kerzner <[email protected]>wrote: > >> Okay, >> >> the error was there because of Java 7. I heard about some weird Java 7 >> error and Lucene. Back to Java 6, and everything works fine: builds, >> extracts, closes files. >> >> Thank you, >> Mark >> >> >> On Tue, Aug 30, 2011 at 9:10 PM, Mark Kerzner <[email protected]>wrote: >> >>> Well, >>> >>> that error WAS important. It compiles and pretends to work, but does not >>> extract any text or metadata (that's why it is so fast!). >>> >>> Thank you, >>> Mark >>> >>> >>> On Tue, Aug 30, 2011 at 7:20 PM, Mark Kerzner <[email protected]>wrote: >>> >>>> I do get an error in the build, but it create the core snapshot jar >>>> anyway. Should I be concerned? >>>> >>>> Thank you, >>>> Mark >>>> >>>> [INFO] ------------------------------------------------------------- >>>> [ERROR] COMPILATION ERROR : >>>> [INFO] ------------------------------------------------------------- >>>> [ERROR] >>>> /home/mark/ThirdParty/tika-source/tika-site/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java:[89,34] >>>> error: cannot access JPEGDecodeParam >>>> [INFO] 1 error >>>> [INFO] ------------------------------------------------------------- >>>> [INFO] >>>> ------------------------------------------------------------------------ >>>> [INFO] Reactor Summary: >>>> [INFO] >>>> [INFO] Apache Tika parent ................................ SUCCESS >>>> [32.118s] >>>> [INFO] Apache Tika core .................................. SUCCESS >>>> [15.994s] >>>> [INFO] Apache Tika parsers ............................... FAILURE >>>> [57.498s] >>>> [INFO] Apache Tika application ........................... SKIPPED >>>> [INFO] Apache Tika OSGi bundle ........................... SKIPPED >>>> [INFO] Apache Tika ....................................... SKIPPED >>>> [INFO] >>>> ------------------------------------------------------------------------ >>>> [INFO] BUILD FAILURE >>>> [INFO] >>>> ------------------------------------------------------------------------ >>>> [INFO] Total time: 2:23.922s >>>> [INFO] Finished at: Tue Aug 30 18:52:20 CDT 2011 >>>> [INFO] Final Memory: 28M/156M >>>> [INFO] >>>> ------------------------------------------------------------------------ >>>> [ERROR] Failed to execute goal >>>> org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile >>>> (default-compile) on project tika-parsers: Compilation failure >>>> [ERROR] >>>> /home/mark/ThirdParty/tika-source/tika-site/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java:[89,34] >>>> error: cannot access JPEGDecodeParam >>>> [ERROR] -> [Help 1] >>>> [ERROR] >>>> [ERROR] To see the full stack trace of the errors, re-run Maven with the >>>> -e switch. >>>> [ERROR] Re-run Maven using the -X switch to enable full debug logging. >>>> [ERROR] >>>> [ERROR] For more information about the errors and possible solutions, >>>> please read the following articles: >>>> [ERROR] [Help 1] >>>> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException >>>> [ERROR] >>>> [ERROR] After correcting the problems, you can resume the build with the >>>> command >>>> [ERROR] mvn <goals> -rf :tika-parsers >>>> >>>> >>>> On Tue, Aug 30, 2011 at 7:08 PM, Mark Kerzner <[email protected]>wrote: >>>> >>>>> SUCCESS!!!! >>>>> >>>>> Nick, not only it closes all files, but it feels to work much faster (I >>>>> mean, in the debuggers, real performance may vary :) >>>>> >>>>> Thank you everybody for today's productive discussion and help. >>>>> >>>>> Mark >>>>> >>>>> PS. If anyone every gets sued, they should use FreeEed for eDiscovery >>>>> and come back a winner! >>>>> >>>>> >>>>> On Tue, Aug 30, 2011 at 5:25 PM, Nick Burch >>>>> <[email protected]>wrote: >>>>> >>>>>> On Tue, 30 Aug 2011, Mark Kerzner wrote: >>>>>> >>>>>>> For the time being, is there a workaround that I could use? Right >>>>>>> now, this >>>>>>> is a show-stopper for my application >>>>>>> >>>>>> >>>>>> Any chance you could do a svn checkout, build, and try with that? >>>>>> After my last email, I have a nagging feeling about the timing of making >>>>>> NPOIFS implement closable... I upgraded the POI dependency earlier >>>>>> today, so >>>>>> it's worth checking with >>>>>> >>>>>> Nick >>>>>> >>>>> >>>>> >>>> >>> >> >
