Re: Individual character variation lists

Nick White Wed, 12 Mar 2014 05:56:23 -0700

Hi John,

On Wed, Mar 12, 2014 at 04:57:38AM -0700, John Green wrote:
> Bottom line up front: Has anyone compiled a list of common misperceptions on
> the part of tesseract? E.g.: e is often seen as o and l can be mistaken for 1,
> etc.


Tesseract has some basic information of that sort built in to its 
training files, which it uses to help recognition.

You can see the list for english by unpacking the english 
.traineddata file:

  combine_tessdata -u /path/to/eng.traineddata eng.

And then looking at the resulting eng.unicharambigs file. It's 
documented in the manpage unicharambigs.5, and it's pretty 
straightforward. 

Nick

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Individual character variation lists

Reply via email to