Tesseract recognizes "123123", but not "123"

satuon Thu, 22 Nov 2012 08:40:16 -0800

I've succeeded in completing the process of training tesseract for 7 
segment numbers. I've created a new language, which I call "dg7", and I 
launch Tesseract with "tesseract input.tif output -l dg7"


I'm now testing its recognition by creating screenshots of text that has 
the same font and size as the training images, and I've encountered this 
problem - Tesseract would recognize an image containing "123123", but not 
"123".

Any idea why this is happening? Is it normal? Is something wrong with the 
training TIFF/box pair I gave it?

I trained it with a TIFF/box pair generated from this text:

01234567891231278237128938919381212389189310987163534
> 1231321
> -123.41 -0.5 -938.05
> -912.4


May be the sample is too small? Note I want to train it to recognize only 
numbers, which can have digits, dot, or minus sign.


The 7 segment font I'm using for training looks like this:


<https://lh3.googleusercontent.com/-lt6jOZDYLdw/UK47Do5qFHI/AAAAAAAAANM/Pd73mQQJpzQ/s1600/Clipboard02.png>


-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Tesseract recognizes "123123", but not "123"

Reply via email to