K. Thank you. Will check. On Wed, Feb 10, 2021 at 2:23 PM Peter Kronenberg <[email protected]> wrote:
> I have also noticed since yesterday that there are files in my temp > directory that aren’t being cleaned up. All of these files contain the > output of Tesseract > > > > > > *From:* Peter Kronenberg > *Sent:* Wednesday, February 10, 2021 12:35 PM > *To:* [email protected] > *Subject:* Error calling ImageMagick > > > > I think yesterday’s code introduced a bug. The temporary file that is > created for ImageMagick is not there. > > > > > > [main] INFO org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract is > installed and is being invoked. This can add greatly to processing time. > If you do not want tesseract to be applied to your files see: > https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disable-ocr > > magick: no images found for operation `-resize' at CLI arg 9 @ > error/operation.c/CLIOption/5361. > > [main] WARN org.apache.tika.parser.ocr.TesseractOCRParser - ImageMagick > failed (commandline: [magick, -density, 300, -depth, 4, -colorspace, gray, > -filter, triangle, -resize, 200%, > C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.tmp, > C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.tmp]) > > org.apache.commons.exec.ExecuteException: Process exited with an error: 1 > (Exit value: 1) > > at > org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404) > > at > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166) > > at > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:153) > > at > org.apache.tika.parser.ocr.ImagePreprocessor.process(ImagePreprocessor.java:121) > > at > org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:280) > > at > org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:248) > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277) > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277) > > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > > at > org.apache.tika.parser.image.AbstractImageParser.parse(AbstractImageParser.java:94) > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277) > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277) > > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > > at org.torchai.ImageMagick.parse(ImageMagick.java:43) > > at org.torchai.ImageMagick.main(ImageMagick.java:56) > > Text: MARLEY was dead, to begin with. There is no doubt whatever about > > that. The register of his burial was signed by the clergyman, the clerk, > > the undertaker, and the chief mourner. Scrooge signed it. And > > Scrooge’s name was good upon ’Change, for anything he chose to put > > his hand to. > > > > > > Here’s the code: > > > > *public static *String parse(String file) *throws *TikaException, > SAXException, IOException { > > *final *AutoDetectParser parser = *new *AutoDetectParser(*new * > TikaConfig()); > > *final *ParseContext parseContext = *new *ParseContext(); > > *final *TesseractOCRConfig tessConfig = *new *TesseractOCRConfig(); > parseContext.set(AutoDetectParser.*class*, parser); > parseContext.set(TesseractOCRConfig.*class*, tessConfig); > > tessConfig.setEnableImageProcessing(*true*); > > ContentHandler contentHandler = *new *BodyContentHandler(); > > Metadata metadata = *new *Metadata(); > > > *try *(TikaInputStream stream = TikaInputStream.*get*(*new * > BufferedInputStream(*new *FileInputStream(file)))) { > parser.parse(stream, contentHandler, metadata, parseContext); > } > > *return *contentHandler.toString(); > } > > >
