Re: [tesseract-ocr] improving boxes

Mike Wed, 14 May 2014 23:47:27 -0700

Hey Zdenko, thanks for your response! 

Sorry I didn't show any examples. Here are the images:



<https://lh3.googleusercontent.com/-MYoguppN3Yk/U3PfzdqeoaI/AAAAAAAABd8/3Y-3algMNO8/s1600/Screenshot+2014-05-12+00.27.10.png><https://lh6.googleusercontent.com/-J1N9oqe_2YE/U3PfsZWBZhI/AAAAAAAABd0/QdOuZ4QfVt8/s1600/Screenshot+2014-05-12+00.25.54.png>
\<https://lh3.googleusercontent.com/-aPxHy50szMk/U3Pf_xc0WZI/AAAAAAAABeE/6FQeYPl5M_g/s1600/Screenshot+2014-05-12+00.27.59.png>
 
<https://lh4.googleusercontent.com/-rBZB-nPgljI/U3PgF-1V3lI/AAAAAAAABeM/Oo1GIsupsLo/s1600/Screenshot+2014-05-12+00.28.08.png>


As preprocessing steps I made a few:
1) DPI is as high as possible (letters are about 30-50 pixel high)
2) adaptive thresholding is used to remove the most of the noise and it 
works quite well
3) image is framed with white rectangle. 

I didn't do:
1) deskewing - image is sometimes not perfectly horizontal (but it's just 
couple degrees off)
2) any of morphology filters such as erosion, dilation: in the most cases 
it was worsening results
3) any other image processing (bluring, enhancing, smoothing etc.)


Not sure if any other ideas were proposed. What makes me wonder is why 
those boxes are well placed some times and the other time placed just plain 
awfully? The biggest problem - as you can see - is taking two lines as one. 
I used also version without adaptive threshold, but the problem stays the 
same.




W dniu niedziela, 11 maja 2014 17:12:41 UTC+2 użytkownik zdenop napisał:
>
> You did not provide any example image - it does not help ;-).
> Did you try suggested solution on wiki or forum for image improving (it 
> was discussed here few times)?
>
> Zdenko
>
>
> On Sat, May 10, 2014 at 7:03 PM, Mike <[email protected] <javascript:>>wrote:
>
>> Hello,
>>
>> I'm working on mobile app which uses tesseract library for OCR. I trained 
>> tesseract for my own fonts but results are still very unstable. When I 
>> debug results it seems library recognizes letters correctly if boxes are 
>> found correctly. However, in many cases they are incorrect.
>>
>> For preprocessing I'm using adaptive thresholding, which deals with 
>> pretty well. 
>>
>> The common problems with boxes are:
>> 1) detecting one character as two or vice versa
>> 2) detecting very long but narrow boxes covering few lines
>> 3) not detecting boxes
>>
>> How to improve boxes detection? Can I constrain their sizes or ratio?
>>
>> Any suggestions are appreciated.
>>
>>
>> Mike
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected]<javascript:>
>> .
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/2599fce4-947e-4093-bf01-f83e0945cfc8%40googlegroups.com<https://groups.google.com/d/msgid/tesseract-ocr/2599fce4-947e-4093-bf01-f83e0945cfc8%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9e4d4bb3-326c-4376-ae67-59ea5e377ea0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] improving boxes

Reply via email to