Ok so I just tried after resizing my image by 2 and by 4 and it still doesn't work : tesseract says "Empty page!!". However, if I manually link the segments (with the brush tool in Gimp, see here : http://i.imgur.com/akVmAgh.png ), it works but it doesn't feel like it's a good training for tesseract. Any advice ?
Thank you Le lundi 6 juillet 2015 09:18:44 UTC+2, Pierre-Henri DAUVERGNE a écrit : > > Hi, thank you for your answer :) > > Each character is about 100x160 pixels, is that too low ? I'll try with > bigger ones and I'll post the results here > > Le samedi 4 juillet 2015 04:10:18 UTC+2, Art Rhyno a écrit : >> >> Hi, >> >> >> >> I wonder if it has something to do with the sizing of the characters in >> the image that you are using for font training. I swapped out the character >> without the linked segments for a character in a set I am using and it >> seemed to work ok. The set is too big for the list but I have attached the >> image I used. >> >> >> >> art >> >> >> >> *From:* [email protected] [mailto:[email protected]] *On >> Behalf Of *Pierre-Henri DAUVERGNE >> *Sent:* Friday, July 03, 2015 10:20 AM >> *To:* [email protected] >> *Subject:* [tesseract-ocr] Train tesseract for 14-segment display >> >> >> >> Hello everyone. >> >> I've posted on stackoverflow already but haven't had an answer yet ( >> http://stackoverflow.com/questions/31131796/14-segment-display-and-tesseract-ocr-with-opencv >> ). >> >> I'm looking for a way to accurately OCR 14-segment display. As you can >> see in my SO thread, I trained tesseract with dilated characters which link >> all of its segments together. My issue is that when I read from my webcam a >> character, I have to erode it first to remove noise. After that, I dilate >> it. >> However, I can't do it enough to link all the segments together without >> having issues with letters like 'B' and 'D' and the letter 'V' is not >> recognized at all (I believe it is because of the space between the >> diagonal being too long). >> >> · What I trained tesseract with (that's the "V" letter) : >> http://i.imgur.com/NbmVqkb.png (segments are all linked) >> >> · What I feed tesseract with : http://i.imgur.com/0E4iXXk.png >> (some segments are linked, some aren't) >> >> I tried to train tesseract with characters where all the segments aren't >> linked but it says "Empty page !!". When I manually link the segments, the >> training works fine (it feels weird that tesseract can't be trained with >> blanck space inside characters since some of the existing languages (ie. >> arabic or chineese) already have some). >> >> To bypass this issue, I've been trying different kind of image processing >> algorithms (like thinning, in order to dilate "in height" but not in >> "width") but gave more accurate results. >> >> Thank you for your help ! >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/451dbd65-20b7-437a-8b5b-a0a726bdad06%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/451dbd65-20b7-437a-8b5b-a0a726bdad06%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4f0135b3-ced6-439c-8272-66299e6c2a03%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

