Oh yeah, here's the output of tessdata -v: tesseract 5.1.0 leptonica-1.79.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1 Found AVX2 Found AVX Found FMA Found SSE4.1 Found OpenMP 201511 Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4
On Monday, June 6, 2022 at 11:46:30 AM UTC-5 Lucas L. wrote: > It seems to be specific to the document in question. However I'm afraid I > can't post the document because it has sensitive information on it. I guess > I can try to scrub the info using an image editing tool and see if the > error still occurs. > > On Monday, June 6, 2022 at 11:21:25 AM UTC-5 zdenop wrote: > >> Can you please share ocrIn_1.tif + info which tessdata version you use? >> + output of 'tesseract -v' >> >> Zdenko >> >> >> po 6. 6. 2022 o 17:53 Lucas L. <infinit...@gmail.com> napĂsal(a): >> >>> Hi, I'm trying to upgrade Tesseract in our Ubuntu 20.04 VMs used to OCR >>> documents to Tesseract 5.1 from 4.1.1, both versions were built from source >>> on that VM. 4.1.1 worked, but 5.1 throws an error that I can't seem to find >>> anywhere else online: >>> >>> sudo -u userx tesseract --loglevel ALL --oem 1 -l eng >>> /opt/.../pdfprocessor/test/ocr-working/1/ocrIn_1.tif >>> /opt/.../pdfprocessor/test/test pdf >>> Error in selectDefaultPdfEncoding: type selection failure >>> Error during processing. >>> >>> I have tried the training data from both "tessdata" and "tessdata_best" >>> and got the same error. Any help would be appreciated. >>> >>> Thanks, >>> Lucas LeBlanc >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/6a8a3c7c-5c09-478e-a897-dca4314646e6n%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/6a8a3c7c-5c09-478e-a897-dca4314646e6n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fbb5cabc-c288-412d-b70a-dc1a6300dc04n%40googlegroups.com.