I think yesterday's code introduced a bug. The temporary file that is created for ImageMagick is not there.
[main] INFO org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract is installed and is being invoked. This can add greatly to processing time. If you do not want tesseract to be applied to your files see: https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disable-ocr magick: no images found for operation `-resize' at CLI arg 9 @ error/operation.c/CLIOption/5361. [main] WARN org.apache.tika.parser.ocr.TesseractOCRParser - ImageMagick failed (commandline: [magick, -density, 300, -depth, 4, -colorspace, gray, -filter, triangle, -resize, 200%, C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.tmp, C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.tmp]) org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404) at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166) at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:153) at org.apache.tika.parser.ocr.ImagePreprocessor.process(ImagePreprocessor.java:121) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:280) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:248) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.tika.parser.image.AbstractImageParser.parse(AbstractImageParser.java:94) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.torchai.ImageMagick.parse(ImageMagick.java:43) at org.torchai.ImageMagick.main(ImageMagick.java:56) Text: MARLEY was dead, to begin with. There is no doubt whatever about that. The register of his burial was signed by the clergyman, the clerk, the undertaker, and the chief mourner. Scrooge signed it. And Scrooge's name was good upon 'Change, for anything he chose to put his hand to. Here's the code: public static String parse(String file) throws TikaException, SAXException, IOException { final AutoDetectParser parser = new AutoDetectParser(new TikaConfig()); final ParseContext parseContext = new ParseContext(); final TesseractOCRConfig tessConfig = new TesseractOCRConfig(); parseContext.set(AutoDetectParser.class, parser); parseContext.set(TesseractOCRConfig.class, tessConfig); tessConfig.setEnableImageProcessing(true); ContentHandler contentHandler = new BodyContentHandler(); Metadata metadata = new Metadata(); try (TikaInputStream stream = TikaInputStream.get(new BufferedInputStream(new FileInputStream(file)))) { parser.parse(stream, contentHandler, metadata, parseContext); } return contentHandler.toString(); }
