No luck sadly, when I edited the image in Irfanview to block out the sensitive parts and tried to OCR it again, the error didn't occur. I'm not sure what changed in the .tiff image file. Any ideas on what kind of image metadata can possibly cause this "selectDefaultPdfEncoding" error?
Only differences I can notice between the two files is that the original has 16 BPP color depth. They both have LZW compression. On Monday, June 6, 2022 at 11:47:31 AM UTC-5 Lucas L. wrote: > Oh yeah, here's the output of tessdata -v: > > tesseract 5.1.0 > leptonica-1.79.0 > libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : > libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1 > Found AVX2 > Found AVX > Found FMA > Found SSE4.1 > Found OpenMP 201511 > Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 > liblz4/1.9.2 libzstd/1.4.4 > > On Monday, June 6, 2022 at 11:46:30 AM UTC-5 Lucas L. wrote: > >> It seems to be specific to the document in question. However I'm afraid I >> can't post the document because it has sensitive information on it. I guess >> I can try to scrub the info using an image editing tool and see if the >> error still occurs. >> >> On Monday, June 6, 2022 at 11:21:25 AM UTC-5 zdenop wrote: >> >>> Can you please share ocrIn_1.tif + info which tessdata version you use? >>> + output of 'tesseract -v' >>> >>> Zdenko >>> >>> >>> po 6. 6. 2022 o 17:53 Lucas L. <[email protected]> napĂsal(a): >>> >>>> Hi, I'm trying to upgrade Tesseract in our Ubuntu 20.04 VMs used to >>>> OCR documents to Tesseract 5.1 from 4.1.1, both versions were built from >>>> source on that VM. 4.1.1 worked, but 5.1 throws an error that I can't seem >>>> to find anywhere else online: >>>> >>>> sudo -u userx tesseract --loglevel ALL --oem 1 -l eng >>>> /opt/.../pdfprocessor/test/ocr-working/1/ocrIn_1.tif >>>> /opt/.../pdfprocessor/test/test pdf >>>> Error in selectDefaultPdfEncoding: type selection failure >>>> Error during processing. >>>> >>>> I have tried the training data from both "tessdata" and "tessdata_best" >>>> and got the same error. Any help would be appreciated. >>>> >>>> Thanks, >>>> Lucas LeBlanc >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/6a8a3c7c-5c09-478e-a897-dca4314646e6n%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/6a8a3c7c-5c09-478e-a897-dca4314646e6n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5dd36e21-cd85-4938-8a1d-e7ea504a715bn%40googlegroups.com.

