I solved this problem, but when I reverted to an old Tesseract the accuracy went down from 99% to a shocking 75%. I can't believe this is happening. Why would anyone remove an entirely useful feature from their software? Do I really have to spend 10 hours learning how to train this thing to understand new characters? I've never done that before and I would prefer the solution that I almost had which only required one line of code? Further, if I do have to train it how many images will I need?
On Sunday, July 14, 2019 at 9:00:10 PM UTC-7, Kyle Foley wrote: > > I'm trying to set the tessedit_char_whitelist but it does not work in > tesseract 4 so I read here > > > https://github.com/tesseract-ocr/tesseract/issues/751#issuecomment-423521780 > > from amitdo that I need to use --oem 0. I put in the following syntax > > str4 = pytesseract.image_to_string(Image.open(str3), > config='--oem 0 -c > tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabdefghijklmnopqrstuvwxyzḥś') > > > and now I get the following error message: > > > > pytesseract.pytesseract.TesseractError: (1, "Failed loading language 'eng' > Tesseract couldn't load any languages! Could not initialize tesseract.") > > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/54b8ef02-bb30-43a3-aba8-caee2ab6b094%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

