On 4 December 2012 13:58, zdenko podobny <[email protected]> wrote:
>
> Where did you find "advertised features of tesseract is that it works
> equally well for black-on-white and white-on-black text"? I never heard
> about it.

It used to be mentioned fairly prominently, in the README in the wiki,
I think. It's still mentioned here:
http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseracticdar2007.pdf
in section 2:

Processing follows a traditional step-by-step
pipeline, but some of the stages were unusual in their
day, and possibly remain so even now. The first step is
a connected component analysis in which outlines of
the components are stored. This was a computationally
expensive design decision at the time, but had a
significant advantage: by inspection of the nesting of
outlines, and the number of child and grandchild
outlines, it is simple to detect inverse text and
recognize it as easily as black-on-white text. Tesseract
was probably the first OCR engine able to handle
white-on-black text so trivially.

--
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to