Which Lucinda font do you think this is? All Lucinda fonts I see in a
Google Image search are nothing like this.

You're right, this does not OCR well. In fact, if you just crop out a part
of it to remove other noise, say, 09:43 AM, even with lots of margin
Tesseract isn't even finding anything it thinks looks like text in normal
page segmentation.

The best I got (for the cropped out time) was:

39:43 HH

So 28% incorrect.

The definition of the 'M' is quite eroded already which is not great.



On 20 August 2015 at 08:29, Amit Rao <[email protected]> wrote:

> HI folks,
>
> I am using Tesseract IOS SDK to OCR parking stubs. The parking stubs are
> primarily in 2 formats. Tesseract does quite well on one of the formats but
> the OCR text
> for the second format is pretty much useless. I have attached the image
> that Tesseract is unable to OCR. If someone is able to report any success
> with OCRing this image
> I would really appreciate it. So far I have tried the following but they
> do not help with the OCR results.
>
> 1. Cropping the image
> 2. Reducing the height and width of the image with same/different aspect
> ratio
> 3. Binarizing the image into black and white
> 4. Filtering the image to smoothen the image.
>
> I haven't tried augmenting the training data set yet. The font seems to be
> pretty standard (Lucida) and my understanding is that unless the fonts are
> non-standard
> augmenting the training data will not be very useful.
>
> Your help/suggestions will be greatly appreciated.
>
> Thank you,
> Amit Rao
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/f7ed92d0-6448-48c8-a404-774965d9b35a%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/f7ed92d0-6448-48c8-a404-774965d9b35a%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAORW5vh5u99bQyLcE94%3DKknNrbmY%3DKtmvjwrzgbOchaTzbjUUQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to