Unrelated to my previous questions.  I'm getting some sort of timeout in Tika 
in TesseractOCRParser.runOCRProcess.  It's one of the errors that say 
'TesseractOCRParser timeout'.  What exactly is it doing here?  Does it spawn a 
separate process to do the OCR?  We're having some performance issues, so in a 
way, this doesn't come as a surprise.  Just trying to understand a little more 
what's going on

private void runOCRProcess(Process process, int timeout) throws IOException, 
TikaException {
    process.getOutputStream().close();
    InputStream out = process.getInputStream();
    InputStream err = process.getErrorStream();
    StringBuilder outBuilder = new StringBuilder();
    StringBuilder errBuilder = new StringBuilder();
    Thread outThread = this.logStream(out, outBuilder);
    Thread errThread = this.logStream(err, errBuilder);
    outThread.start();
    errThread.start();
    int exitValue = -2147483648;

    try {
        boolean finished = process.waitFor((long)timeout, TimeUnit.SECONDS);
        if (!finished) {
            throw new TikaException("TesseractOCRParser timeout");
        }

        exitValue = process.exitValue();
    } catch (InterruptedException var12) {
        Thread.currentThread().interrupt();
        throw new TikaException("TesseractOCRParser interrupted", var12);
    } catch (IllegalThreadStateException var13) {
        throw new TikaException("TesseractOCRParser timeout");
    }




Peter Kronenberg  |  Senior AI Analytic ENGINEER
C: 703.887.5623
[Torch AI]<http://www.torch.ai/>
5250 W 116th Pl, Suite 200., Leawood, KS 66211
WWW.TORCH.AI<http://www.torch.ai/>


Reply via email to