Which Lucinda font do you think this is? All Lucinda fonts I see in a Google Image search are nothing like this.
You're right, this does not OCR well. In fact, if you just crop out a part of it to remove other noise, say, 09:43 AM, even with lots of margin Tesseract isn't even finding anything it thinks looks like text in normal page segmentation. The best I got (for the cropped out time) was: 39:43 HH So 28% incorrect. The definition of the 'M' is quite eroded already which is not great. On 20 August 2015 at 08:29, Amit Rao <[email protected]> wrote: > HI folks, > > I am using Tesseract IOS SDK to OCR parking stubs. The parking stubs are > primarily in 2 formats. Tesseract does quite well on one of the formats but > the OCR text > for the second format is pretty much useless. I have attached the image > that Tesseract is unable to OCR. If someone is able to report any success > with OCRing this image > I would really appreciate it. So far I have tried the following but they > do not help with the OCR results. > > 1. Cropping the image > 2. Reducing the height and width of the image with same/different aspect > ratio > 3. Binarizing the image into black and white > 4. Filtering the image to smoothen the image. > > I haven't tried augmenting the training data set yet. The font seems to be > pretty standard (Lucida) and my understanding is that unless the fonts are > non-standard > augmenting the training data will not be very useful. > > Your help/suggestions will be greatly appreciated. > > Thank you, > Amit Rao > > > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/f7ed92d0-6448-48c8-a404-774965d9b35a%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/f7ed92d0-6448-48c8-a404-774965d9b35a%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAORW5vh5u99bQyLcE94%3DKknNrbmY%3DKtmvjwrzgbOchaTzbjUUQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

