[tesseract-ocr] Re: 6od instead of God

misonistic Tue, 11 Nov 2014 20:50:12 -0800

Yes, I can pre-process each individual image to make it work, but 
unfortunately I've been unable to come up with a consistent pre-processing 
method that would work in general. I've been trying for a while now.
I've known that retraining is an option from the beginning but I'm 
concerned that it may fix some problems and introduce others. The default 
eng.traineddata works pretty well except that every once in a while a 
character is misread.
I've just downloaded and tried vietocr 4 beta and while it does get this 
one right it regrettably still misses quite a few others.

What I really need is a dictionary lookup for every non-word or garbage
word tesseract finds that would return the best dictionary match. I'm
thinking about writing my own but that would be absurd if tesseract is
supposed to already contain this functionality. I understand from Ray's
explanation here
<https://groups.google.com/forum/#!searchin/tesseract-ocr/dictionary/tesseract-ocr/VJXE40iksnI/tr-_9O4F5OcJ>

that the correct character choice is not ranked high enough to be
considered for a dictionary match, and that would make sense if I didn't
have an ambigs rule for it. But if I have an explicit unicharambigs rule
that says consider replacing this character with another to look for a
dictionary match, I don't know how tesseract still ends up preferring a
non-word over a dictionary match?
I keep thinking I must be missing some obscure config setting. I've already
tried tweaking a while bunch of them from this list
<http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version> but to
no avail.

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/4b7432da-192d-491d-bdd2-b8de4d8bae0c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: 6od instead of God

Reply via email to