I've succeeded in completing the process of training tesseract for 7 segment numbers. I've created a new language, which I call "dg7", and I launch Tesseract with "tesseract input.tif output -l dg7"
I'm now testing its recognition by creating screenshots of text that has the same font and size as the training images, and I've encountered this problem - Tesseract would recognize an image containing "123123", but not "123". Any idea why this is happening? Is it normal? Is something wrong with the training TIFF/box pair I gave it? I trained it with a TIFF/box pair generated from this text: 01234567891231278237128938919381212389189310987163534 > 1231321 > -123.41 -0.5 -938.05 > -912.4 May be the sample is too small? Note I want to train it to recognize only numbers, which can have digits, dot, or minus sign. The 7 segment font I'm using for training looks like this: <https://lh3.googleusercontent.com/-lt6jOZDYLdw/UK47Do5qFHI/AAAAAAAAANM/Pd73mQQJpzQ/s1600/Clipboard02.png> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

