No, not seeing that anymore.  I thought it might have been related to the 
ImageMagick thing, because they both seemed to have to do with temp files.  But 
obviously, that wasn't really the case.  
So not sure what was causing that, but I don't see it anymore.

And thanks for the coding hint.  Wasn't sure if TikaInputStream automatically 
did the buffering

-----Original Message-----
From: Tim Allison <[email protected]> 
Sent: Thursday, February 11, 2021 4:43 PM
To: [email protected]
Subject: Re: Error calling ImageMagick

Are you still seeing tesseract txt files piling up?  I'm not able to reproduce 
this on windows/linux/mac.

This shouldn't cause a problem, but this:

try (TikaInputStream stream = TikaInputStream.get(new BufferedInputStream(new 
FileInputStream(file)))) {

is more efficient if you do this:

try (TikaInputStream stream = TikaInputStream.get(file)) {

On Wed, Feb 10, 2021 at 2:23 PM Peter Kronenberg <[email protected]> 
wrote:
>
> I have also noticed since yesterday that there are files in my temp 
> directory that aren’t being cleaned up.  All of these files contain 
> the output of Tesseract
>
>
>
>
>
> From: Peter Kronenberg
> Sent: Wednesday, February 10, 2021 12:35 PM
> To: [email protected]
> Subject: Error calling ImageMagick
>
>
>
> I think yesterday’s code introduced a bug.  The temporary file that is 
> created for ImageMagick is not there.
>
>
>
>
>
> [main] INFO org.apache.tika.parser.ocr.TesseractOCRParser - Tesseract 
> is installed and is being invoked. This can add greatly to processing 
> time.  If you do not want tesseract to be applied to your files see: 
> https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disab
> le-ocr
>
> magick: no images found for operation `-resize' at CLI arg 9 @ 
> error/operation.c/CLIOption/5361.
>
> [main] WARN org.apache.tika.parser.ocr.TesseractOCRParser - 
> ImageMagick failed (commandline: [magick, -density, 300, -depth, 4, 
> -colorspace, gray, -filter, triangle, -resize, 200%, 
> C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t
> mp, 
> C:\Users\PETERK~1\AppData\Local\Temp\apache-tika-3889844060604687745.t
> mp])
>
> org.apache.commons.exec.ExecuteException: Process exited with an 
> error: 1 (Exit value: 1)
>
>             at 
> org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecuto
> r.java:404)
>
>             at 
> org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1
> 66)
>
>             at 
> org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:1
> 53)
>
>             at 
> org.apache.tika.parser.ocr.ImagePreprocessor.process(ImagePreprocessor
> .java:121)
>
>             at 
> org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser
> .java:280)
>
>             at 
> org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser
> .java:248)
>
>             at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14
> 3)
>
>             at 
> org.apache.tika.parser.image.AbstractImageParser.parse(AbstractImagePa
> rser.java:94)
>
>             at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
>
>             at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:14
> 3)
>
>             at org.torchai.ImageMagick.parse(ImageMagick.java:43)
>
>             at org.torchai.ImageMagick.main(ImageMagick.java:56)
>
> Text: MARLEY was dead, to begin with. There is no doubt whatever about
>
> that. The register of his burial was signed by the clergyman, the 
> clerk,
>
> the undertaker, and the chief mourner. Scrooge signed it. And
>
> Scrooge’s name was good upon ’Change, for anything he chose to put
>
> his hand to.
>
>
>
>
>
> Here’s the code:
>
>
>
> public static String parse(String file) throws TikaException, 
> SAXException, IOException {
>
>     final AutoDetectParser parser = new AutoDetectParser(new 
> TikaConfig());
>
>     final ParseContext parseContext = new ParseContext();
>
>     final TesseractOCRConfig tessConfig = new TesseractOCRConfig();
>     parseContext.set(AutoDetectParser.class, parser);
>     parseContext.set(TesseractOCRConfig.class, tessConfig);
>
>     tessConfig.setEnableImageProcessing(true);
>
>     ContentHandler contentHandler = new BodyContentHandler();
>
>     Metadata metadata = new Metadata();
>
>
>     try (TikaInputStream stream = TikaInputStream.get(new 
> BufferedInputStream(new FileInputStream(file)))) {
>         parser.parse(stream, contentHandler, metadata, parseContext);
>     }
>
>     return contentHandler.toString();
> }
>
>

Reply via email to