Sure, little late but here are images without boxes. I'm sending both versions: thresholded and not.
https://www.dropbox.com/sh/3kj6dcbzv5bzp0r/AABTWe37gsIcS280-yYcr5zLa The first one (003.png) is special case, because of dots which are insanely difficult to remove. Thanks a lot! Mike W dniu czwartek, 15 maja 2014 09:15:52 UTC+2 użytkownik zdenop napisał: > > can you provide images without drawn boxes? > > Zdenko > > > On Wed, May 14, 2014 at 11:51 PM, Mike <[email protected] > <javascript:>>wrote: > >> Hey Zdenko, thanks for your response! >> >> Sorry I didn't show any examples. Here are the images: >> >> >> >> <https://lh3.googleusercontent.com/-MYoguppN3Yk/U3PfzdqeoaI/AAAAAAAABd8/3Y-3algMNO8/s1600/Screenshot+2014-05-12+00.27.10.png><https://lh6.googleusercontent.com/-J1N9oqe_2YE/U3PfsZWBZhI/AAAAAAAABd0/QdOuZ4QfVt8/s1600/Screenshot+2014-05-12+00.25.54.png> >> \<https://lh3.googleusercontent.com/-aPxHy50szMk/U3Pf_xc0WZI/AAAAAAAABeE/6FQeYPl5M_g/s1600/Screenshot+2014-05-12+00.27.59.png> >> >> <https://lh4.googleusercontent.com/-rBZB-nPgljI/U3PgF-1V3lI/AAAAAAAABeM/Oo1GIsupsLo/s1600/Screenshot+2014-05-12+00.28.08.png> >> >> >> As preprocessing steps I made a few: >> 1) DPI is as high as possible (letters are about 30-50 pixel high) >> 2) adaptive thresholding is used to remove the most of the noise and it >> works quite well >> 3) image is framed with white rectangle. >> >> I didn't do: >> 1) deskewing - image is sometimes not perfectly horizontal (but it's just >> couple degrees off) >> 2) any of morphology filters such as erosion, dilation: in the most cases >> it was worsening results >> 3) any other image processing (bluring, enhancing, smoothing etc.) >> >> >> Not sure if any other ideas were proposed. What makes me wonder is why >> those boxes are well placed some times and the other time placed just plain >> awfully? The biggest problem - as you can see - is taking two lines as one. >> I used also version without adaptive threshold, but the problem stays the >> same. >> >> >> >> >> W dniu niedziela, 11 maja 2014 17:12:41 UTC+2 użytkownik zdenop napisał: >>> >>> You did not provide any example image - it does not help ;-). >>> Did you try suggested solution on wiki or forum for image improving (it >>> was discussed here few times)? >>> >>> Zdenko >>> >>> >>> On Sat, May 10, 2014 at 7:03 PM, Mike <[email protected]> wrote: >>> >>>> Hello, >>>> >>>> I'm working on mobile app which uses tesseract library for OCR. I >>>> trained tesseract for my own fonts but results are still very unstable. >>>> When I debug results it seems library recognizes letters correctly if >>>> boxes >>>> are found correctly. However, in many cases they are incorrect. >>>> >>>> For preprocessing I'm using adaptive thresholding, which deals with >>>> pretty well. >>>> >>>> The common problems with boxes are: >>>> 1) detecting one character as two or vice versa >>>> 2) detecting very long but narrow boxes covering few lines >>>> 3) not detecting boxes >>>> >>>> How to improve boxes detection? Can I constrain their sizes or ratio? >>>> >>>> Any suggestions are appreciated. >>>> >>>> >>>> Mike >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> >>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/tesseract-ocr/2599fce4-947e-4093-bf01-f83e0945cfc8% >>>> 40googlegroups.com<https://groups.google.com/d/msgid/tesseract-ocr/2599fce4-947e-4093-bf01-f83e0945cfc8%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected]<javascript:> >> . >> Visit this group at http://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/9e4d4bb3-326c-4376-ae67-59ea5e377ea0%40googlegroups.com<https://groups.google.com/d/msgid/tesseract-ocr/9e4d4bb3-326c-4376-ae67-59ea5e377ea0%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3bdd115b-e80f-48f3-ab85-a109cdd9277a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

