I found the parameters "C:\Program Files\Tesseract-OCR\tesseract.exe" "..\Lambregts0001 - cleaned.jpg" "Lambregts0001 - cleaned.txt" -c tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 :@." It is not working. "uw BTW nummer:: NLOO7900000B01"
Any other ideas ? Op donderdag 21 september 2023 om 22:25:12 UTC+2 schreef elvi...@gmail.com: > White list the digits so that the O will not confuse it. > You can also try --psm 13 if all of your texts are single line. > > On Thu, Sep 21, 2023, 4:07 PM A Nederpelt <powe...@gmail.com> wrote: > >> Hi. >> I am trying to use the tesseract engine instead of the nuance engine. >> When i currently use tesseract.exe the image it returns a few strange >> characters. >> 2x OO instead of 00 >> "uw BTW nummer:: NLOO7900000B01" >> instead of >> "uw BTW nummer:: NL007900000B01" >> and >> "Tel £01" >> instead of >> "Tel : 01" >> but "Tel : 0168-452452" is recognized ok. >> >> I see no optimization using >> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md >> because it are really clean documents. >> >> Am i missing some parameters ? Like a second run, or more accurate run >> etc. >> Maybe compile tesseract.exe myself with different more quality parameters >> ? >> >> Thanks, >> Alwin >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1658873b-4b35-4273-ac1b-629689ee70d1n%40googlegroups.com.