Re: [tesseract-ocr] Any suggestions with getting Tesseract to OCR this image?

Amit Rao Thu, 20 Aug 2015 02:53:18 -0700

Thanks, Allistair. I was guessing that this font was similar to Lucida 
Console. e.g.


https://www.google.com/search?q=lucida+console+font&espv=2&biw=1174&bih=761&tbm=isch&tbo=u&source=univ&sa=X&sqi=2&ved=0CCUQsARqFQoTCKrkzcypt8cCFQddHgod7XsPDw#imgrc=H27K5k9g7hx19M%3A

However, I don't know for certain what font this is and I don't know of a 
tool that will help me know for sure which font the image uses. The only 
text I am really interested in is "HH:MM AM/PM" but if I crop the image to 
include only the time Tesseract is still not able to read it similar to 
what you reported.. I cropped the image to include 09:43 AM and it reads it 
as  *@9243 Rh*

If this is a font that Tesseract does not recognize would it help 
augmenting the training data set with data from images with this format and 
font? 

Thanks,
amit



On Thursday, August 20, 2015 at 4:34:25 AM UTC-4, Allistair C wrote:
>
> Which Lucinda font do you think this is? All Lucinda fonts I see in a 
> Google Image search are nothing like this.
>
> You're right, this does not OCR well. In fact, if you just crop out a part 
> of it to remove other noise, say, 09:43 AM, even with lots of margin 
> Tesseract isn't even finding anything it thinks looks like text in normal 
> page segmentation.
>
> The best I got (for the cropped out time) was:
>
> 39:43 HH
>
> So 28% incorrect.
>
> The definition of the 'M' is quite eroded already which is not great.
>
>
>
> On 20 August 2015 at 08:29, Amit Rao <[email protected] <javascript:>> 
> wrote:
>
>> HI folks, 
>>
>> I am using Tesseract IOS SDK to OCR parking stubs. The parking stubs are 
>> primarily in 2 formats. Tesseract does quite well on one of the formats but 
>> the OCR text 
>> for the second format is pretty much useless. I have attached the image 
>> that Tesseract is unable to OCR. If someone is able to report any success 
>> with OCRing this image 
>> I would really appreciate it. So far I have tried the following but they 
>> do not help with the OCR results.
>>
>> 1. Cropping the image
>> 2. Reducing the height and width of the image with same/different aspect 
>> ratio
>> 3. Binarizing the image into black and white
>> 4. Filtering the image to smoothen the image. 
>>
>> I haven't tried augmenting the training data set yet. The font seems to 
>> be pretty standard (Lucida) and my understanding is that unless the fonts 
>> are non-standard 
>> augmenting the training data will not be very useful. 
>>
>> Your help/suggestions will be greatly appreciated. 
>>
>> Thank you,
>> Amit Rao
>>
>>
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/f7ed92d0-6448-48c8-a404-774965d9b35a%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/f7ed92d0-6448-48c8-a404-774965d9b35a%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/82e43d56-d3f1-480f-a6d1-10cde2afa7b5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Any suggestions with getting Tesseract to OCR this image?

Reply via email to