So, y, that's a timeout on the forked process for tesseract. I've found
that poor quality/noisy images can take a bunch longer for tesseract to
process.

If there's anything we need to fix or make configurable, please open an
issue.

Cheers,

         Tim

On Tue, Jan 18, 2022 at 8:51 PM Peter Kronenberg <[email protected]>
wrote:

> Unrelated to my previous questions.  I’m getting some sort of timeout in
> Tika in TesseractOCRParser.runOCRProcess.  It’s one of the errors that say
> ‘TesseractOCRParser timeout’.  What exactly is it doing here?  Does it
> spawn a separate process to do the OCR?  We’re having some performance
> issues, so in a way, this doesn’t come as a surprise.  Just trying to
> understand a little more what’s going on
>
>
>
> private void runOCRProcess(Process process, int timeout) throws
> IOException, TikaException {
>     process.getOutputStream().close();
>     InputStream out = process.getInputStream();
>     InputStream err = process.getErrorStream();
>     StringBuilder outBuilder = new StringBuilder();
>     StringBuilder errBuilder = new StringBuilder();
>     Thread outThread = this.logStream(out, outBuilder);
>     Thread errThread = this.logStream(err, errBuilder);
>     outThread.start();
>     errThread.start();
>     int exitValue = -2147483648;
>
>     try {
>         boolean finished = process.waitFor((long)timeout, TimeUnit.
> *SECONDS*);
>         if (!finished) {
>             throw new TikaException("TesseractOCRParser timeout");
>         }
>
>         exitValue = process.exitValue();
>     } catch (InterruptedException var12) {
>         Thread.*currentThread*().interrupt();
>         throw new TikaException("TesseractOCRParser interrupted", var12);
>     } catch (IllegalThreadStateException var13) {
>         throw new TikaException("TesseractOCRParser timeout");
>     }
>
>
>
>
>
>
>
>
>
> *Peter Kronenberg*  *| * *Senior AI Analytic ENGINEER *
>
> *C: 703.887.5623*
>
> [image: Torch AI] <http://www.torch.ai/>
>
> 5250 W 116th Pl, Suite 200., Leawood, KS 66211
> WWW.TORCH.AI <http://www.torch.ai/>
>
>
>
>
>

Reply via email to