The font does not look like that - look the shape of the 0 which has a strikethrough in your image but not in Lucinda of the M shape. I am not sure font training will do a lot here, I think it's more the quality of the edges in your image due to the dot matrix printing or however it's printed producing uncertain edges.
Perhaps others can chip in. On 20 August 2015 at 10:31, Amit Rao <[email protected]> wrote: > Thanks, Allistair. I was guessing that this font was similar to Lucida > Console. e.g. > > > https://www.google.com/search?q=lucida+console+font&espv=2&biw=1174&bih=761&tbm=isch&tbo=u&source=univ&sa=X&sqi=2&ved=0CCUQsARqFQoTCKrkzcypt8cCFQddHgod7XsPDw#imgrc=H27K5k9g7hx19M%3A > > However, I don't know for certain what font this is and I don't know of a > tool that will help me know for sure which font the image uses. The only > text I am really interested in is "HH:MM AM/PM" but if I crop the image to > include only the time Tesseract is still not able to read it similar to > what you reported.. I cropped the image to include 09:43 AM and it reads it > as *@9243 Rh* > > If this is a font that Tesseract does not recognize would it help > augmenting the training data set with data from images with this format and > font? > > Thanks, > amit > > > > On Thursday, August 20, 2015 at 4:34:25 AM UTC-4, Allistair C wrote: >> >> Which Lucinda font do you think this is? All Lucinda fonts I see in a >> Google Image search are nothing like this. >> >> You're right, this does not OCR well. In fact, if you just crop out a >> part of it to remove other noise, say, 09:43 AM, even with lots of margin >> Tesseract isn't even finding anything it thinks looks like text in normal >> page segmentation. >> >> The best I got (for the cropped out time) was: >> >> 39:43 HH >> >> So 28% incorrect. >> >> The definition of the 'M' is quite eroded already which is not great. >> >> >> >> On 20 August 2015 at 08:29, Amit Rao <[email protected]> wrote: >> >>> HI folks, >>> >>> I am using Tesseract IOS SDK to OCR parking stubs. The parking stubs are >>> primarily in 2 formats. Tesseract does quite well on one of the formats but >>> the OCR text >>> for the second format is pretty much useless. I have attached the image >>> that Tesseract is unable to OCR. If someone is able to report any success >>> with OCRing this image >>> I would really appreciate it. So far I have tried the following but they >>> do not help with the OCR results. >>> >>> 1. Cropping the image >>> 2. Reducing the height and width of the image with same/different aspect >>> ratio >>> 3. Binarizing the image into black and white >>> 4. Filtering the image to smoothen the image. >>> >>> I haven't tried augmenting the training data set yet. The font seems to >>> be pretty standard (Lucida) and my understanding is that unless the fonts >>> are non-standard >>> augmenting the training data will not be very useful. >>> >>> Your help/suggestions will be greatly appreciated. >>> >>> Thank you, >>> Amit Rao >>> >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/f7ed92d0-6448-48c8-a404-774965d9b35a%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/f7ed92d0-6448-48c8-a404-774965d9b35a%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/82e43d56-d3f1-480f-a6d1-10cde2afa7b5%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/82e43d56-d3f1-480f-a6d1-10cde2afa7b5%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAORW5vgWXjeUYga%2Bc7TQ4v84RtuQNk9ch3CPUxB_Fq-xnJPDhg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

