Hello. I have images that contain characters that are made from individual dots, like from a dot matrix printer. I tried to use various operations on the images (binarization, edge detection, dilatation, ...) and was able to make the dots bigger so they are connected 90% of the time. However, detection is still very bad.
This image contains characters from A to L <https://lh3.googleusercontent.com/-WxgjmUF846M/VEig6eA1FNI/AAAAAAAAAAM/BdQPQPVTUrs/s1600/AL.png> my modified version is <https://lh5.googleusercontent.com/-TUZSXsiBHJY/VEihDy5RCUI/AAAAAAAAAAU/HmwIkEemSAY/s1600/AL2.png> after recognition, Tesseract (3.02, using the .NET wrapper) gives me for the standard english language the characters "FJBEDEFEHIJKL". Only the last 5 characters are right, the rest is wrong. Do you know of a way to make recognition better besides training a new font for this special case? Tesseract works quite good for other projects I have, I would love a solution that does not rely on a special font if possible. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e6b8d4bb-ecc3-463c-9cc7-96f46a63be27%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

