It seems to be specific to the document in question. However I'm afraid I can't post the document because it has sensitive information on it. I guess I can try to scrub the info using an image editing tool and see if the error still occurs.
On Monday, June 6, 2022 at 11:21:25 AM UTC-5 zdenop wrote: > Can you please share ocrIn_1.tif + info which tessdata version you use? > + output of 'tesseract -v' > > Zdenko > > > po 6. 6. 2022 o 17:53 Lucas L. <[email protected]> napĂsal(a): > >> Hi, I'm trying to upgrade Tesseract in our Ubuntu 20.04 VMs used to OCR >> documents to Tesseract 5.1 from 4.1.1, both versions were built from source >> on that VM. 4.1.1 worked, but 5.1 throws an error that I can't seem to find >> anywhere else online: >> >> sudo -u userx tesseract --loglevel ALL --oem 1 -l eng >> /opt/.../pdfprocessor/test/ocr-working/1/ocrIn_1.tif >> /opt/.../pdfprocessor/test/test pdf >> Error in selectDefaultPdfEncoding: type selection failure >> Error during processing. >> >> I have tried the training data from both "tessdata" and "tessdata_best" >> and got the same error. Any help would be appreciated. >> >> Thanks, >> Lucas LeBlanc >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/6a8a3c7c-5c09-478e-a897-dca4314646e6n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/6a8a3c7c-5c09-478e-a897-dca4314646e6n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8da6b087-3f91-4a33-a2b4-d9daa082570en%40googlegroups.com.

