Recognising Known Characters

jacob . chiong Mon, 28 Jan 2013 02:37:58 -0800

I followed the 
thread 
https://groups.google.com/forum/?fromgroups=#!topic/tesseract-ocr/S9CIK3jOMWw 
and thought I got it working but was surprised to find tesseract confusing 
"B" for "8" even though I specified the format for alphabet.  So I think I 
must have some misunderstanding somewhere.


I have a config file named "neric" in folder 
"..\Tesseract-OCR\tessdata\configs" whose contents are:

load_system_dawg     F
load_freq_dawg       F
user_patterns_suffix neric

I have a pattern file named "eng.neric" in "..\Tesseract-OCR\tessdata" 
whose contents are:

\A\d\d\d\d\d\d\d\A

where first and last characters are uppercase alphabets with 7 digits 
in-between.

I have also changed kSaneNumConcreteChars to 0.

I used the command:

tesseract neric.tif neric neric (output to neric.txt, using config neric)

Most of the time it recognised fine, but for the letter "B" at the last 
character position it returns "8" even though the pattern file showed "\A",

That made me suspicious that my understanding is wrong all along.

tesseract version 3.02
neric.tif captured from webcam with no compression containing only that 
9-character image in Helvetica font using OpenCV's captureframe.

Would be most grateful for any pointers here.

Thank you all.

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Recognising Known Characters

Reply via email to