Re: problem with LED-fonts recognition ;(

Tomek Suchecki Sat, 08 Dec 2012 01:19:49 -0800

Thank you. SSOCR solved the problem. For best performance a sugest :
convert source.jpg -negate -o result.jpg && ssocr -T -d -1 ....
Best results are with digits on white background.


2012/12/6 zdenko podobny <[email protected]>

>
> On Wed, Dec 5, 2012 at 11:02 AM, mike oldfield <[email protected]>wrote:
>
>> Tesseract-ocr has still problem with decoding of LED-like digits.
>> I made something like this in my squeeze comand line:
>> convert 1.jpg 1.tif && tesseract 1.tif 1.txt nobatch digits
>> ...but effects are very poor and far very far from the truth ;)
>>
>
> IMO converting from one format to other one do no help.
> config "digits" is a good option - it can help in case of "S" vs "5" , "B"
> vs "8" "l" "1" problem...
> I would suggest to use pagesegmode 7 or 8.
>
>
>> the same siyuation is when I make negativ (black digits on white
>> background):
>> convert 1.jpg -negate 1-negativ.jpg && tesseract 1.tif 1.txt nobatch
>> digits
>>
>> I wrote article:
>> http://www.seeingwithsound.com/ocr.htm
>> djpeg -grayscale -dither none  -outfile 1.pnm 1.jpg  &&  gocr -i 1.pnm -o
>> 1.txt
>> and results are better (no "." but digits are properly decoded , but 1
>> instead of 7 and S instead of 5)
>>
>> I tried to use http://www.onlineocr.net/ and results are impressive !!!
>> I minimize picture to 176x144 , YUYV palette instead of MJPEG, brightes:
>> 100%,  96x96dpi number of frames on 1 picture 50, results has "." in the
>> right place, but problem with "7"->"1" and "5"->"S"
>>
>> I`ve question, which algoritm is uded by onlineocr.net?For example
>> free-ocr.com , newocr.com has very bad results (same as tessearct fresh
>> installed in debian)
>>
>> I think it is not only about OCR algorithm, but also about improving
> (preprocessing) input image and OCR training. BTW: there is dedicated 7
> segment OCR[1].
> [1] http://www.unix-ag.uni-kl.de/~auerswal/ssocr/
>
>
>> I attache some pictures made by my logitech c120 (and negative copy).
>>
>
> >  tesseract 1.jpg 1 quiet && cat 1.txt
> 35.3
>
> ;-)
>
>>
>>
>>
>>
>>
>> W dniu środa, 5 grudnia 2012 08:06:20 UTC+1 użytkownik Speedy napisał:
>>
>>> Just check out Ray's October 2007 paper "An Overview of the Tesseract
>>> OCR Engine" where it says:
>>>
>>> The first step is
>>> a connected component analysis in which outlines of
>>> the components are stored. This was a computationally
>>> expensive design decision at the time, but had a
>>> significant advantage: by inspection of the nesting of
>>> outlines, and the number of child and grandchild
>>> outlines, it is simple to detect inverse text and
>>> recognize it as easily as black-on-white text. Tesseract
>>> was probably the first OCR engine able to handle
>>> white-on-black text so trivially.
>>>
>>> And in fact, in our own application after image preprocessing we pass
>>> the binarized image as a white-on-black image to tesseract and never had
>>> problems with that. Of course, our training images are also white-on-black,
>>> so this might also affect our findings.
>>>
>>> Marcus
>>>
>>>
>>> On Tuesday, December 4, 2012 2:58:26 PM UTC+1, zdenop wrote:
>>>>
>>>> Where did you find "advertised features of tesseract is that it works
>>>> equally well for black-on-white and white-on-black text"? I never heard
>>>> about it.
>>>> See forum for other experience: https://groups.**
>>>> google.com/d/topic/tesseract-**ocr/XoX6t5Ih1IM/discussion<https://groups.google.com/d/topic/tesseract-ocr/XoX6t5Ih1IM/discussion>
>>>>
>>>> --
>>>> Zdenko
>>>>
>>>> On Tue, Dec 4, 2012 at 2:42 PM, Speedy <[email protected]> wrote:
>>>>
>>>>> Why is a black background a problem? One of the advertised features of
>>>>> tesseract is that it works equally well for black-on-white and
>>>>> white-on-black text.
>>>>
>>>> Marcus
>>>>>
>>>>>
>>>>> On Tuesday, December 4, 2012 11:11:36 AM UTC+1, zdenop wrote:
>>>>>
>>>>>> Search forum. I remember discussion about **simi**lar topic.
>>>>>> AFAIR: tesseract has problem with letter(symbol) that consists of
>>>>>> several not connected parts (e.g. dots, lines) - solution should be to
>>>>>> preprocess image (blur).
>>>>>>
>>>>>> Generally: black background is problem. Quality of image is too low
>>>>>> (JPEG, quality: 75), there is no information about DPI... Anyway this 
>>>>>> "LED"
>>>>>> font is not standard font, so maybe training will be need.
>>>>>>
>>>>>> --
>>>>>> Zdenko
>>>>>>
>>>>>> On Tue, Dec 4, 2012 at 12:43 AM, mike oldfield 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>>
>>>>>>> <https://lh5.googleusercontent.com/-Ly6oR_Rmkag/UL04-iH5XaI/AAAAAAAAAAU/J-T592D8834/s1600/1.jpg>
>>>>>>> Hello
>>>>>>>
>>>>>>> I`d like to recognize LED-like numbers/digits.
>>>>>>> I attached image (jpg, 680x320, brightness 65%, contrast 100%).
>>>>>>> Is there any libraries or presets to decode these digits? For
>>>>>>> example googledocuments conversion and free-ocr.com doesn`t work.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To post to this group, send email to [email protected]
>>>>>>>
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> tesseract-oc...@**googlegroups.**com
>>>>>>>
>>>>>>> For more options, visit this group at
>>>>>>> http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>
>>>>>>
>>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to [email protected]
>>>>> To unsubscribe from this group, send email to
>>>>> tesseract-oc...@googlegroups.**com
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>
>>>>
>>>>
>>>>   --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: problem with LED-fonts recognition ;(

Reply via email to