Just check out Ray's October 2007 paper "An Overview of the Tesseract OCR 
Engine" where it says:

The first step is
a connected component analysis in which outlines of
the components are stored. This was a computationally
expensive design decision at the time, but had a
significant advantage: by inspection of the nesting of
outlines, and the number of child and grandchild
outlines, it is simple to detect inverse text and
recognize it as easily as black-on-white text. Tesseract
was probably the first OCR engine able to handle
white-on-black text so trivially. 

And in fact, in our own application after image preprocessing we pass the 
binarized image as a white-on-black image to tesseract and never had 
problems with that. Of course, our training images are also white-on-black, 
so this might also affect our findings.

Marcus


On Tuesday, December 4, 2012 2:58:26 PM UTC+1, zdenop wrote:
>
> Where did you find "advertised features of tesseract is that it works 
> equally well for black-on-white and white-on-black text"? I never heard 
> about it. 
> See forum for other experience: 
> https://groups.google.com/d/topic/tesseract-ocr/XoX6t5Ih1IM/discussion
>
> -- 
> Zdenko
>
> On Tue, Dec 4, 2012 at 2:42 PM, Speedy <[email protected] <javascript:>
> > wrote:
>
>> Why is a black background a problem? One of the advertised features of 
>> tesseract is that it works equally well for black-on-white and 
>> white-on-black text. 
>
> Marcus
>>
>>
>> On Tuesday, December 4, 2012 11:11:36 AM UTC+1, zdenop wrote:
>>
>>> Search forum. I remember discussion about **similar topic.
>>> AFAIR: tesseract has problem with letter(symbol) that consists of 
>>> several not connected parts (e.g. dots, lines) - solution should be to 
>>> preprocess image (blur).
>>>
>>> Generally: black background is problem. Quality of image is too low 
>>> (JPEG, quality: 75), there is no information about DPI... Anyway this "LED" 
>>> font is not standard font, so maybe training will be need.
>>>
>>> -- 
>>> Zdenko
>>>
>>> On Tue, Dec 4, 2012 at 12:43 AM, mike oldfield <[email protected]>wrote:
>>>
>>>>
>>>> <https://lh5.googleusercontent.com/-Ly6oR_Rmkag/UL04-iH5XaI/AAAAAAAAAAU/J-T592D8834/s1600/1.jpg>
>>>> Hello 
>>>>
>>>> I`d like to recognize LED-like numbers/digits.
>>>> I attached image (jpg, 680x320, brightness 65%, contrast 100%).
>>>> Is there any libraries or presets to decode these digits? For example 
>>>> googledocuments conversion and free-ocr.com doesn`t work.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to [email protected]
>>>>
>>>> To unsubscribe from this group, send email to
>>>> tesseract-oc...@**googlegroups.com
>>>>
>>>> For more options, visit this group at
>>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]<javascript:>
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>
>  

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to