I actually can't show you all the characters but I can give you a sample. I 
have the 10 digits and all letters. I tried to decrease the size of the 
characters but it still didn't work. Tesseract didn't say "Empty page!!" 
but "Failure ! Couldn't find a matching blob" for all letters, the digits 
worked fine.

Here is a small sample : http://i.imgur.com/NeYBKrj.png the letters are V X 
B C D.

Thank you for your help :)


Le mardi 7 juillet 2015 13:40:24 UTC+2, Art Rhyno a écrit :
>
>  Could you attach the “my_font_exp0.png” and “my_font_exp0.box” that are 
> producing the “Empty page!!” message? 
>
>  
>
> art
>
>  
>
> *From:* [email protected] <javascript:> [mailto:
> [email protected] <javascript:>] *On Behalf Of *Pierre-Henri 
> DAUVERGNE
> *Sent:* Tuesday, July 07, 2015 3:26 AM
> *To:* [email protected] <javascript:>
> *Subject:* Re: [tesseract-ocr] Train tesseract for 14-segment display
>
>  
>  
> Acutally I followed this guide 
> <http://blog.ayoungprogrammer.com/2013/01/equation-ocr-part-2-training-characters.html>
>  
> which is essentially the same as the one you gave me. I am doing all that. 
> I use qt-box-editor to manually set the boxes over the characters then I 
> use the command "tesseract my_font_exp0.png my_font_exp0 nobatch box.train" 
> but it says "Empty page!!" and nothing else. It creates an empty .txt file. 
> Whenever I try to train with linked segments, it works. 
> That's why I'm looking for an image-processing way of linking all the 
> segments as they should be or a tesseract way of training it with unlinked 
> segments.
>
>
>
> Le lundi 6 juillet 2015 14:41:22 UTC+2, Art Rhyno a écrit :
>
>  Hi,
>
>  
>
> I am guessing my attachment didn’t make it to the list but the character I 
> used is about 17x25 pixels.  I resaved the sample as a PNG (instead of a 
> TIFF) and am trying again. Remember that you can (and often have to) edit 
> the box files for training. Tesseract may split your character into more 
> than one blob, but you can override this. By default, the “makebox” 
> produced:
>
>  
>
> l 45 254 53 279 0
>
> ’ 55 267 62 277 0
>
>  
>
> But I modified this to be:
>
> V 45 254 62 279 0
>
>  
>
> I found this blog post really helpful for training [1]. You can contact me 
> off-list if you want the entire training set I used, but I only did the one 
> character.
>
>  
>
> art
>
> ---
>
> 1. 
> http://michaeljaylissner.com/blog/adding-new-fonts-to-tesseract-3-ocr-engine
>
>  
>
> *From:* [email protected] [mailto:[email protected]] *On 
> Behalf Of *Pierre-Henri DAUVERGNE
> *Sent:* Monday, July 06, 2015 4:29 AM
> *To:* [email protected]
> *Subject:* Re: [tesseract-ocr] Train tesseract for 14-segment display
>
>  
>  
> Ok so I just tried after resizing my image by 2 and by 4 and it still 
> doesn't work : tesseract says "Empty page!!".
> However, if I manually link the segments (with the brush tool in Gimp, see 
> here : http://i.imgur.com/akVmAgh.png ), it works but it doesn't feel 
> like it's a good training for tesseract.
> Any advice ?
>
> Thank you
>
> Le lundi 6 juillet 2015 09:18:44 UTC+2, Pierre-Henri DAUVERGNE a écrit :
>
>  Hi, thank you for your answer :)
>
> Each character is about 100x160 pixels, is that too low ? I'll try with 
> bigger ones and I'll post the results here
>
> Le samedi 4 juillet 2015 04:10:18 UTC+2, Art Rhyno a écrit :
>
>  Hi,
>
>  
>
> I wonder if it has something to do with the sizing of the characters in 
> the image that you are using for font training. I swapped out the character 
> without the linked segments for a character in a set I am using and it 
> seemed to work ok. The set is too big for the list but I have attached the 
> image I used. 
>
>  
>
> art
>
>  
>
> *From:* [email protected] [mailto:[email protected]] *On 
> Behalf Of *Pierre-Henri DAUVERGNE
> *Sent:* Friday, July 03, 2015 10:20 AM
> *To:* [email protected]
> *Subject:* [tesseract-ocr] Train tesseract for 14-segment display
>
>  
>  
> Hello everyone.
>
> I've posted on stackoverflow already but haven't had an answer yet (
> http://stackoverflow.com/questions/31131796/14-segment-display-and-tesseract-ocr-with-opencv
> ).
>
> I'm looking for a way to accurately OCR 14-segment display. As you can see 
> in my SO thread, I trained tesseract with dilated characters which link all 
> of its segments together. My issue is that when I read from my webcam a 
> character, I have to erode it first to remove noise. After that, I dilate 
> it.
> However, I can't do it enough to link all the segments together without 
> having issues with letters like 'B' and 'D' and the letter 'V' is not 
> recognized at all (I believe it is because of the space between the 
> diagonal being too long).
>
> ·        What I trained tesseract with (that's the "V" letter) : 
> http://i.imgur.com/NbmVqkb.png (segments are all linked)
>
> ·        What I feed tesseract with : http://i.imgur.com/0E4iXXk.png 
> (some segments are linked, some aren't)
>
> I tried to train tesseract with characters where all the segments aren't 
> linked but it says "Empty page !!". When I manually link the segments, the 
> training works fine (it feels weird that tesseract can't be trained with 
> blanck space inside characters since some of the existing languages (ie. 
> arabic or chineese) already have some).
>
> To bypass this issue, I've been trying different kind of image processing 
> algorithms (like thinning, in order to dilate "in height" but not in 
> "width") but gave more accurate results.
>
> Thank you for your help !
>  
> -- 
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/tesseract-ocr/451dbd65-20b7-437a-8b5b-a0a726bdad06%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/tesseract-ocr/451dbd65-20b7-437a-8b5b-a0a726bdad06%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>  
>   -- 
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/tesseract-ocr/4f0135b3-ced6-439c-8272-66299e6c2a03%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/tesseract-ocr/4f0135b3-ced6-439c-8272-66299e6c2a03%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>  
>  -- 
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] 
> <javascript:>.
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/tesseract-ocr/44f83e75-7a97-4d1e-a6dc-68533fc75b2f%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/tesseract-ocr/44f83e75-7a97-4d1e-a6dc-68533fc75b2f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>  

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/831536ec-bbc5-44e8-b273-0118e287049d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to