120 seconds

https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-ocr-module/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java#L95

On Thu, Jan 20, 2022 at 2:07 PM Peter Kronenberg <[email protected]>
wrote:

> At this point, I think it’s mostly our problem.   But still want to
> understand what Tika is doing.  What is the default timeout?
>
>
>
> This is what is passed in to runOCRProcess
>
> long timeoutMillis = TikaTaskTimeout.*getTimeoutMillis**(*parseContext,
>         config.getTimeoutSeconds*() ** 1000*)*;
>
>
>
> but I can’t quite figure out where it’s getting the default from or if
> it’s possible to override
>
>
>
> *Peter Kronenberg*  *| * *Senior AI Analytic ENGINEER *
>
> *C: 703.887.5623 *
>
> [image: Torch AI] <http://www.torch.ai/>
>
> 5250 W 116th Pl, Suite 200., Leawood, KS 66211
> WWW.TORCH.AI <http://www.torch.ai/>
>
>
>
>
>
> *From:* Tim Allison <[email protected]>
> *Sent:* Thursday, January 20, 2022 12:40 PM
> *To:* [email protected]
> *Subject:* Re: TesseractOCRParser timeout
>
>
>
> So, y, that's a timeout on the forked process for tesseract. I've found
> that poor quality/noisy images can take a bunch longer for tesseract to
> process.
>
>
>
> If there's anything we need to fix or make configurable, please open an
> issue.
>
>
>
> Cheers,
>
>
>
>          Tim
>
>
>
> On Tue, Jan 18, 2022 at 8:51 PM Peter Kronenberg <
> [email protected]> wrote:
>
> Unrelated to my previous questions.  I’m getting some sort of timeout in
> Tika in TesseractOCRParser.runOCRProcess.  It’s one of the errors that say
> ‘TesseractOCRParser timeout’.  What exactly is it doing here?  Does it
> spawn a separate process to do the OCR?  We’re having some performance
> issues, so in a way, this doesn’t come as a surprise.  Just trying to
> understand a little more what’s going on
>
>
>
> private void runOCRProcess(Process process, int timeout) throws
> IOException, TikaException {
>     process.getOutputStream().close();
>     InputStream out = process.getInputStream();
>     InputStream err = process.getErrorStream();
>     StringBuilder outBuilder = new StringBuilder();
>     StringBuilder errBuilder = new StringBuilder();
>     Thread outThread = this.logStream(out, outBuilder);
>     Thread errThread = this.logStream(err, errBuilder);
>     outThread.start();
>     errThread.start();
>     int exitValue = -2147483648;
>
>     try {
>         boolean finished = process.waitFor((long)timeout, TimeUnit.
> *SECONDS*);
>         if (!finished) {
>             throw new TikaException("TesseractOCRParser timeout");
>         }
>
>         exitValue = process.exitValue();
>     } catch (InterruptedException var12) {
>         Thread.*currentThread*().interrupt();
>         throw new TikaException("TesseractOCRParser interrupted", var12);
>     } catch (IllegalThreadStateException var13) {
>         throw new TikaException("TesseractOCRParser timeout");
>     }
>
>
>
>
>
>
>
>
>
> *Peter Kronenberg*  *| * *Senior AI Analytic ENGINEER *
>
> *C: 703.887.5623*
>
> [image: Torch AI]
> <https://us-east-2.protection.sophos.com/?d=torch.ai&u=aHR0cDovL3d3dy50b3JjaC5haS8=&i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=dHRDUUJralFuRnRCU2tvcmRLNUUycFdBV2RmazdTZU0zZUZVM21GSXhobz0=&h=044443de31a14922bd91e778eda966e3>
>
> 5250 W 116th Pl, Suite 200., Leawood, KS 66211
> WWW.TORCH.AI
> <https://us-east-2.protection.sophos.com?d=torch.ai&u=aHR0cDovL3d3dy50b3JjaC5haS8=&i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=dHRDUUJralFuRnRCU2tvcmRLNUUycFdBV2RmazdTZU0zZUZVM21GSXhobz0=&h=044443de31a14922bd91e778eda966e3>
>
>
>
>
>
>

Reply via email to