Hello everyone. I've posted on stackoverflow already but haven't had an answer yet (http://stackoverflow.com/questions/31131796/14-segment-display-and-tesseract-ocr-with-opencv).
I'm looking for a way to accurately OCR 14-segment display. As you can see in my SO thread, I trained tesseract with dilated characters which link all of its segments together. My issue is that when I read from my webcam a character, I have to erode it first to remove noise. After that, I dilate it. However, I can't do it enough to link all the segments together without having issues with letters like 'B' and 'D' and the letter 'V' is not recognized at all (I believe it is because of the space between the diagonal being too long). - What I trained tesseract with (that's the "V" letter) : http://i.imgur.com/NbmVqkb.png (segments are all linked) - What I feed tesseract with : http://i.imgur.com/0E4iXXk.png (some segments are linked, some aren't) I tried to train tesseract with characters where all the segments aren't linked but it says "Empty page !!". When I manually link the segments, the training works fine (it feels weird that tesseract can't be trained with blanck space inside characters since some of the existing languages (ie. arabic or chineese) already have some). To bypass this issue, I've been trying different kind of image processing algorithms (like thinning, in order to dilate "in height" but not in "width") but gave more accurate results. Thank you for your help ! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/451dbd65-20b7-437a-8b5b-a0a726bdad06%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

