I solved this problem, but when I reverted to an old Tesseract the accuracy 
went down from 99% to a shocking 75%.  I can't believe this is happening.  
Why would anyone remove an entirely useful feature from their software?  Do 
I really have to spend 10 hours learning how to train this thing to 
understand new characters?  I've never done that before and I would prefer 
the solution that I almost had which only required one line of code?  
Further, if I do have to train it how many images will I need?

On Sunday, July 14, 2019 at 9:00:10 PM UTC-7, Kyle Foley wrote:
>
> I'm trying to set the tessedit_char_whitelist but it does not work in 
> tesseract 4 so I read here
>
>
> https://github.com/tesseract-ocr/tesseract/issues/751#issuecomment-423521780
>
> from amitdo that I need to use --oem 0. I put in the following syntax
>
> str4 = pytesseract.image_to_string(Image.open(str3),
>     config='--oem 0 -c 
> tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabdefghijklmnopqrstuvwxyzḥś')
>
>
> and now I get the following error message:
>
>
>
> pytesseract.pytesseract.TesseractError: (1, "Failed loading language 'eng' 
> Tesseract couldn't load any languages! Could not initialize tesseract.")
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/54b8ef02-bb30-43a3-aba8-caee2ab6b094%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to