So, y, that's a timeout on the forked process for tesseract. I've found
that poor quality/noisy images can take a bunch longer for tesseract to
process.
If there's anything we need to fix or make configurable, please open an
issue.
Cheers,
Tim
On Tue, Jan 18, 2022 at 8:51 PM Peter Kronenberg <[email protected]>
wrote:
> Unrelated to my previous questions. I’m getting some sort of timeout in
> Tika in TesseractOCRParser.runOCRProcess. It’s one of the errors that say
> ‘TesseractOCRParser timeout’. What exactly is it doing here? Does it
> spawn a separate process to do the OCR? We’re having some performance
> issues, so in a way, this doesn’t come as a surprise. Just trying to
> understand a little more what’s going on
>
>
>
> private void runOCRProcess(Process process, int timeout) throws
> IOException, TikaException {
> process.getOutputStream().close();
> InputStream out = process.getInputStream();
> InputStream err = process.getErrorStream();
> StringBuilder outBuilder = new StringBuilder();
> StringBuilder errBuilder = new StringBuilder();
> Thread outThread = this.logStream(out, outBuilder);
> Thread errThread = this.logStream(err, errBuilder);
> outThread.start();
> errThread.start();
> int exitValue = -2147483648;
>
> try {
> boolean finished = process.waitFor((long)timeout, TimeUnit.
> *SECONDS*);
> if (!finished) {
> throw new TikaException("TesseractOCRParser timeout");
> }
>
> exitValue = process.exitValue();
> } catch (InterruptedException var12) {
> Thread.*currentThread*().interrupt();
> throw new TikaException("TesseractOCRParser interrupted", var12);
> } catch (IllegalThreadStateException var13) {
> throw new TikaException("TesseractOCRParser timeout");
> }
>
>
>
>
>
>
>
>
>
> *Peter Kronenberg* *| * *Senior AI Analytic ENGINEER *
>
> *C: 703.887.5623*
>
> [image: Torch AI] <http://www.torch.ai/>
>
> 5250 W 116th Pl, Suite 200., Leawood, KS 66211
> WWW.TORCH.AI <http://www.torch.ai/>
>
>
>
>
>