[tesseract-ocr] Train tesseract for recognition of a dotted font

Dear all,
I'm trying to train tesseract for recognition of a dotted font such as this 
image.

<https://lh3.googleusercontent.com/-k1_neF5ZyQw/V2JkBs_4HMI/AAAAAAAAAyI/r_fpKJTN4TwQjPcxyNkg6rts4bAwHGriACLcB/s1600/eng_dotmatrix.dot-matrix.exp0.bmp>

Here is my tif/box file pair that is generated by jTessBoxEditer.
eng_dotmatrix.dot-matrix.exp0.tif
<https://drive.google.com/open?id=0B2tu51tmJ0FvdGt2dW93cnR5d00>
eng_dotmatrix.dot-matrix.exp0.box
<https://drive.google.com/open?id=0B2tu51tmJ0FvenJvR3RqWElqaHM>
(I want to train tesseract for this font as a new language only for
uppercase and digits.)

Then I ran:

tesseract eng_dotmatrix.dot-matrix.exp0.tif eng_dotmatrix.dot-matrix.exp0
box.train

output was only:

Tesseract Open Source OCR Engine v3.02 with Leptonica

and tesseract did not generate .tr file.

Can't I train tesseract for fonts that have too much small blobs in one
character?
I think I can make good blobs by eroding the image, but I don't want to
manipulate the image.
Do you have any suggestions?

O/S: Windows 7
Tesseract Ver: 3.02.02

Regards,
Lee.

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/cda563e5-2755-42b3-8656-de18dc2684f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to