I actually can't show you all the characters but I can give you a sample. I have the 10 digits and all letters. I tried to decrease the size of the characters but it still didn't work. Tesseract didn't say "Empty page!!" but "Failure ! Couldn't find a matching blob" for all letters, the digits worked fine.
Here is a small sample : http://i.imgur.com/NeYBKrj.png the letters are V X B C D. Thank you for your help :) Le mardi 7 juillet 2015 13:40:24 UTC+2, Art Rhyno a écrit : > > Could you attach the “my_font_exp0.png” and “my_font_exp0.box” that are > producing the “Empty page!!” message? > > > > art > > > > *From:* [email protected] <javascript:> [mailto: > [email protected] <javascript:>] *On Behalf Of *Pierre-Henri > DAUVERGNE > *Sent:* Tuesday, July 07, 2015 3:26 AM > *To:* [email protected] <javascript:> > *Subject:* Re: [tesseract-ocr] Train tesseract for 14-segment display > > > > Acutally I followed this guide > <http://blog.ayoungprogrammer.com/2013/01/equation-ocr-part-2-training-characters.html> > > which is essentially the same as the one you gave me. I am doing all that. > I use qt-box-editor to manually set the boxes over the characters then I > use the command "tesseract my_font_exp0.png my_font_exp0 nobatch box.train" > but it says "Empty page!!" and nothing else. It creates an empty .txt file. > Whenever I try to train with linked segments, it works. > That's why I'm looking for an image-processing way of linking all the > segments as they should be or a tesseract way of training it with unlinked > segments. > > > > Le lundi 6 juillet 2015 14:41:22 UTC+2, Art Rhyno a écrit : > > Hi, > > > > I am guessing my attachment didn’t make it to the list but the character I > used is about 17x25 pixels. I resaved the sample as a PNG (instead of a > TIFF) and am trying again. Remember that you can (and often have to) edit > the box files for training. Tesseract may split your character into more > than one blob, but you can override this. By default, the “makebox” > produced: > > > > l 45 254 53 279 0 > > ’ 55 267 62 277 0 > > > > But I modified this to be: > > V 45 254 62 279 0 > > > > I found this blog post really helpful for training [1]. You can contact me > off-list if you want the entire training set I used, but I only did the one > character. > > > > art > > --- > > 1. > http://michaeljaylissner.com/blog/adding-new-fonts-to-tesseract-3-ocr-engine > > > > *From:* [email protected] [mailto:[email protected]] *On > Behalf Of *Pierre-Henri DAUVERGNE > *Sent:* Monday, July 06, 2015 4:29 AM > *To:* [email protected] > *Subject:* Re: [tesseract-ocr] Train tesseract for 14-segment display > > > > Ok so I just tried after resizing my image by 2 and by 4 and it still > doesn't work : tesseract says "Empty page!!". > However, if I manually link the segments (with the brush tool in Gimp, see > here : http://i.imgur.com/akVmAgh.png ), it works but it doesn't feel > like it's a good training for tesseract. > Any advice ? > > Thank you > > Le lundi 6 juillet 2015 09:18:44 UTC+2, Pierre-Henri DAUVERGNE a écrit : > > Hi, thank you for your answer :) > > Each character is about 100x160 pixels, is that too low ? I'll try with > bigger ones and I'll post the results here > > Le samedi 4 juillet 2015 04:10:18 UTC+2, Art Rhyno a écrit : > > Hi, > > > > I wonder if it has something to do with the sizing of the characters in > the image that you are using for font training. I swapped out the character > without the linked segments for a character in a set I am using and it > seemed to work ok. The set is too big for the list but I have attached the > image I used. > > > > art > > > > *From:* [email protected] [mailto:[email protected]] *On > Behalf Of *Pierre-Henri DAUVERGNE > *Sent:* Friday, July 03, 2015 10:20 AM > *To:* [email protected] > *Subject:* [tesseract-ocr] Train tesseract for 14-segment display > > > > Hello everyone. > > I've posted on stackoverflow already but haven't had an answer yet ( > http://stackoverflow.com/questions/31131796/14-segment-display-and-tesseract-ocr-with-opencv > ). > > I'm looking for a way to accurately OCR 14-segment display. As you can see > in my SO thread, I trained tesseract with dilated characters which link all > of its segments together. My issue is that when I read from my webcam a > character, I have to erode it first to remove noise. After that, I dilate > it. > However, I can't do it enough to link all the segments together without > having issues with letters like 'B' and 'D' and the letter 'V' is not > recognized at all (I believe it is because of the space between the > diagonal being too long). > > · What I trained tesseract with (that's the "V" letter) : > http://i.imgur.com/NbmVqkb.png (segments are all linked) > > · What I feed tesseract with : http://i.imgur.com/0E4iXXk.png > (some segments are linked, some aren't) > > I tried to train tesseract with characters where all the segments aren't > linked but it says "Empty page !!". When I manually link the segments, the > training works fine (it feels weird that tesseract can't be trained with > blanck space inside characters since some of the existing languages (ie. > arabic or chineese) already have some). > > To bypass this issue, I've been trying different kind of image processing > algorithms (like thinning, in order to dilate "in height" but not in > "width") but gave more accurate results. > > Thank you for your help ! > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/451dbd65-20b7-437a-8b5b-a0a726bdad06%40googlegroups.com > > <https://groups.google.com/d/msgid/tesseract-ocr/451dbd65-20b7-437a-8b5b-a0a726bdad06%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/4f0135b3-ced6-439c-8272-66299e6c2a03%40googlegroups.com > > <https://groups.google.com/d/msgid/tesseract-ocr/4f0135b3-ced6-439c-8272-66299e6c2a03%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > To post to this group, send email to [email protected] > <javascript:>. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/44f83e75-7a97-4d1e-a6dc-68533fc75b2f%40googlegroups.com > > <https://groups.google.com/d/msgid/tesseract-ocr/44f83e75-7a97-4d1e-a6dc-68533fc75b2f%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/831536ec-bbc5-44e8-b273-0118e287049d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

