Well, leptonica also provide some binarization methods (see source code[1]). Some explanation can be found at web[2]. Of course there are other binarization methods with published code - e.g. c++ source code for Niblack, Wolf can be found on christian wolf page[3]
IMO in this case it should be worthy to have a look at page segmentation - there is (older) presentation of leptonica posibilities[4]. BTW: It looks like jasonlfunk made an implementation of Kasar, Kumar, Ramakrishnan paper using python and opencv[5].... [1] https://tpgit.github.io/Leptonica/binarize_8c.html [2] http://www.leptonica.com/binarization.html [3] http://liris.cnrs.fr/christian.wolf/software/binarize/ [4] http://www.dicklyon.com/phototech/PhotoTech_11_DocImage_Slides.pdf [5] https://github.com/jasonlfunk/ocr-text-extraction Zdenko On Tue, Jul 1, 2014 at 10:22 PM, Paul <[email protected]> wrote: > This paper > <http://www.m.cs.osakafu-u.ac.jp/cbdar2007/proceedings/papers/O1-1.pdf> > suggests > a binarization approach that might be helpful with your imagery. > Unfortunately you need to implement it on your own in a preprocessing step, > since Tesseract only uses Otsu's method for binarization. Thus the bad > results. > > Am Freitag, 27. Juni 2014 12:47:55 UTC+2 schrieb morteza neishaboori: > >> Hello, >> I want to train tesseract to detect words in such images in the link >> below! >> https://drive.google.com/folderview?id=0B3dLM0w0EeD- >> RFZVc1NjaGNqUlE&usp=sharing >> >> I tried but it was not successful! now I will be happy if somebody can >> give me some hints if it's at all possible to do this with tesseract?! >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/c149a5a9-f72c-4fa8-8f78-9432715d380c%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/c149a5a9-f72c-4fa8-8f78-9432715d380c%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8w__j9RsV08K2XvVh1oMEGwf12Y_-dpd%2BNRKw4Z%2By0CxQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

