Well, leptonica also provide some binarization methods (see source
code[1]). Some explanation can be found at web[2]. Of course there are
other binarization methods with published code - e.g. c++ source code for
Niblack, Wolf can be found on christian wolf page[3]

IMO in this case it should be worthy to have a look at page segmentation -
there is (older) presentation of leptonica posibilities[4].

BTW: It looks like jasonlfunk made an implementation of Kasar, Kumar,
Ramakrishnan paper using python and opencv[5]....

[1] https://tpgit.github.io/Leptonica/binarize_8c.html
[2] http://www.leptonica.com/binarization.html
[3] http://liris.cnrs.fr/christian.wolf/software/binarize/
[4] http://www.dicklyon.com/phototech/PhotoTech_11_DocImage_Slides.pdf
[5] https://github.com/jasonlfunk/ocr-text-extraction

Zdenko


On Tue, Jul 1, 2014 at 10:22 PM, Paul <[email protected]> wrote:

> This paper
> <http://www.m.cs.osakafu-u.ac.jp/cbdar2007/proceedings/papers/O1-1.pdf> 
> suggests
> a binarization approach that might be helpful with your imagery.
> Unfortunately you need to implement it on your own in a preprocessing step,
> since Tesseract only uses Otsu's method for binarization. Thus the bad
> results.
>
> Am Freitag, 27. Juni 2014 12:47:55 UTC+2 schrieb morteza neishaboori:
>
>> Hello,
>> I want to train tesseract to detect words in such images in the link
>> below!
>> https://drive.google.com/folderview?id=0B3dLM0w0EeD-
>> RFZVc1NjaGNqUlE&usp=sharing
>>
>> I tried but it was not successful! now I will be happy if somebody can
>> give me some hints if it's at all possible to do this with tesseract?!
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/c149a5a9-f72c-4fa8-8f78-9432715d380c%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/c149a5a9-f72c-4fa8-8f78-9432715d380c%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8w__j9RsV08K2XvVh1oMEGwf12Y_-dpd%2BNRKw4Z%2By0CxQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to