Sure, little late but here are images without boxes. I'm sending both 
versions: thresholded and not.

https://www.dropbox.com/sh/3kj6dcbzv5bzp0r/AABTWe37gsIcS280-yYcr5zLa

The first one (003.png) is special case, because of dots which are insanely 
difficult to remove.

Thanks a lot!

Mike

W dniu czwartek, 15 maja 2014 09:15:52 UTC+2 użytkownik zdenop napisał:
>
> can you provide images without drawn boxes?
>
> Zdenko
>
>
> On Wed, May 14, 2014 at 11:51 PM, Mike <[email protected] 
> <javascript:>>wrote:
>
>> Hey Zdenko, thanks for your response! 
>>
>> Sorry I didn't show any examples. Here are the images:
>>
>>
>>
>> <https://lh3.googleusercontent.com/-MYoguppN3Yk/U3PfzdqeoaI/AAAAAAAABd8/3Y-3algMNO8/s1600/Screenshot+2014-05-12+00.27.10.png><https://lh6.googleusercontent.com/-J1N9oqe_2YE/U3PfsZWBZhI/AAAAAAAABd0/QdOuZ4QfVt8/s1600/Screenshot+2014-05-12+00.25.54.png>
>> \<https://lh3.googleusercontent.com/-aPxHy50szMk/U3Pf_xc0WZI/AAAAAAAABeE/6FQeYPl5M_g/s1600/Screenshot+2014-05-12+00.27.59.png>
>>  
>> <https://lh4.googleusercontent.com/-rBZB-nPgljI/U3PgF-1V3lI/AAAAAAAABeM/Oo1GIsupsLo/s1600/Screenshot+2014-05-12+00.28.08.png>
>>
>>
>> As preprocessing steps I made a few:
>> 1) DPI is as high as possible (letters are about 30-50 pixel high)
>> 2) adaptive thresholding is used to remove the most of the noise and it 
>> works quite well
>> 3) image is framed with white rectangle. 
>>
>> I didn't do:
>> 1) deskewing - image is sometimes not perfectly horizontal (but it's just 
>> couple degrees off)
>> 2) any of morphology filters such as erosion, dilation: in the most cases 
>> it was worsening results
>> 3) any other image processing (bluring, enhancing, smoothing etc.)
>>
>>
>> Not sure if any other ideas were proposed. What makes me wonder is why 
>> those boxes are well placed some times and the other time placed just plain 
>> awfully? The biggest problem - as you can see - is taking two lines as one. 
>> I used also version without adaptive threshold, but the problem stays the 
>> same.
>>
>>
>>
>>
>> W dniu niedziela, 11 maja 2014 17:12:41 UTC+2 użytkownik zdenop napisał:
>>>
>>>  You did not provide any example image - it does not help ;-).
>>> Did you try suggested solution on wiki or forum for image improving (it 
>>> was discussed here few times)?
>>>
>>> Zdenko
>>>
>>>
>>> On Sat, May 10, 2014 at 7:03 PM, Mike <[email protected]> wrote:
>>>
>>>>  Hello,
>>>>
>>>> I'm working on mobile app which uses tesseract library for OCR. I 
>>>> trained tesseract for my own fonts but results are still very unstable. 
>>>> When I debug results it seems library recognizes letters correctly if 
>>>> boxes 
>>>> are found correctly. However, in many cases they are incorrect.
>>>>
>>>> For preprocessing I'm using adaptive thresholding, which deals with 
>>>> pretty well. 
>>>>
>>>> The common problems with boxes are:
>>>> 1) detecting one character as two or vice versa
>>>> 2) detecting very long but narrow boxes covering few lines
>>>> 3) not detecting boxes
>>>>
>>>> How to improve boxes detection? Can I constrain their sizes or ratio?
>>>>
>>>> Any suggestions are appreciated.
>>>>
>>>>
>>>> Mike
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>>
>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/tesseract-ocr/2599fce4-947e-4093-bf01-f83e0945cfc8%
>>>> 40googlegroups.com<https://groups.google.com/d/msgid/tesseract-ocr/2599fce4-947e-4093-bf01-f83e0945cfc8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected]<javascript:>
>> .
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/9e4d4bb3-326c-4376-ae67-59ea5e377ea0%40googlegroups.com<https://groups.google.com/d/msgid/tesseract-ocr/9e4d4bb3-326c-4376-ae67-59ea5e377ea0%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3bdd115b-e80f-48f3-ab85-a109cdd9277a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to