Tesseract-ocr has still problem with decoding of LED-like digits.
I made something like this in my squeeze comand line:
convert 1.jpg 1.tif && tesseract 1.tif 1.txt nobatch digits
...but effects are very poor and far very far from the truth ;)
the same siyuation is when I make negativ (black digits on white 
background):
convert 1.jpg -negate 1-negativ.jpg && tesseract 1.tif 1.txt nobatch digits

I wrote article:
http://www.seeingwithsound.com/ocr.htm
djpeg -grayscale -dither none  -outfile 1.pnm 1.jpg  &&  gocr -i 1.pnm -o 
1.txt
and results are better (no "." but digits are properly decoded , but 1 
instead of 7 and S instead of 5)

I tried to use http://www.onlineocr.net/ and results are impressive !!! I 
minimize picture to 176x144 , YUYV palette instead of MJPEG, brightes: 
100%,  96x96dpi number of frames on 1 picture 50, results has "." in the 
right place, but problem with "7"->"1" and "5"->"S"

I`ve question, which algoritm is uded by onlineocr.net?For example 
free-ocr.com , newocr.com has very bad results (same as tessearct fresh 
installed in debian)

I attache some pictures made by my logitech c120 (and negative copy).





W dniu środa, 5 grudnia 2012 08:06:20 UTC+1 użytkownik Speedy napisał:
>
> Just check out Ray's October 2007 paper "An Overview of the Tesseract OCR 
> Engine" where it says:
>
> The first step is
> a connected component analysis in which outlines of
> the components are stored. This was a computationally
> expensive design decision at the time, but had a
> significant advantage: by inspection of the nesting of
> outlines, and the number of child and grandchild
> outlines, it is simple to detect inverse text and
> recognize it as easily as black-on-white text. Tesseract
> was probably the first OCR engine able to handle
> white-on-black text so trivially. 
>
> And in fact, in our own application after image preprocessing we pass the 
> binarized image as a white-on-black image to tesseract and never had 
> problems with that. Of course, our training images are also white-on-black, 
> so this might also affect our findings.
>
> Marcus
>
>
> On Tuesday, December 4, 2012 2:58:26 PM UTC+1, zdenop wrote:
>>
>> Where did you find "advertised features of tesseract is that it works 
>> equally well for black-on-white and white-on-black text"? I never heard 
>> about it. 
>> See forum for other experience: 
>> https://groups.google.com/d/topic/tesseract-ocr/XoX6t5Ih1IM/discussion
>>
>> -- 
>> Zdenko
>>
>> On Tue, Dec 4, 2012 at 2:42 PM, Speedy <[email protected]> wrote:
>>
>>> Why is a black background a problem? One of the advertised features of 
>>> tesseract is that it works equally well for black-on-white and 
>>> white-on-black text. 
>>
>> Marcus
>>>
>>>
>>> On Tuesday, December 4, 2012 11:11:36 AM UTC+1, zdenop wrote:
>>>
>>>> Search forum. I remember discussion about **similar topic.
>>>> AFAIR: tesseract has problem with letter(symbol) that consists of 
>>>> several not connected parts (e.g. dots, lines) - solution should be to 
>>>> preprocess image (blur).
>>>>
>>>> Generally: black background is problem. Quality of image is too low 
>>>> (JPEG, quality: 75), there is no information about DPI... Anyway this 
>>>> "LED" 
>>>> font is not standard font, so maybe training will be need.
>>>>
>>>> -- 
>>>> Zdenko
>>>>
>>>> On Tue, Dec 4, 2012 at 12:43 AM, mike oldfield <[email protected]>wrote:
>>>>
>>>>>
>>>>> <https://lh5.googleusercontent.com/-Ly6oR_Rmkag/UL04-iH5XaI/AAAAAAAAAAU/J-T592D8834/s1600/1.jpg>
>>>>> Hello 
>>>>>
>>>>> I`d like to recognize LED-like numbers/digits.
>>>>> I attached image (jpg, 680x320, brightness 65%, contrast 100%).
>>>>> Is there any libraries or presets to decode these digits? For example 
>>>>> googledocuments conversion and free-ocr.com doesn`t work.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  -- 
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to [email protected]
>>>>>
>>>>> To unsubscribe from this group, send email to
>>>>> tesseract-oc...@**googlegroups.com
>>>>>
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>
>>>>
>>>>  -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>
>>
>> 

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

<<attachment: 1.jpg>>

<<attachment: 1-n.jpg>>

Reply via email to