Thank you. SSOCR solved the problem. For best performance a sugest : convert source.jpg -negate -o result.jpg && ssocr -T -d -1 .... Best results are with digits on white background.
2012/12/6 zdenko podobny <[email protected]> > > On Wed, Dec 5, 2012 at 11:02 AM, mike oldfield <[email protected]>wrote: > >> Tesseract-ocr has still problem with decoding of LED-like digits. >> I made something like this in my squeeze comand line: >> convert 1.jpg 1.tif && tesseract 1.tif 1.txt nobatch digits >> ...but effects are very poor and far very far from the truth ;) >> > > IMO converting from one format to other one do no help. > config "digits" is a good option - it can help in case of "S" vs "5" , "B" > vs "8" "l" "1" problem... > I would suggest to use pagesegmode 7 or 8. > > >> the same siyuation is when I make negativ (black digits on white >> background): >> convert 1.jpg -negate 1-negativ.jpg && tesseract 1.tif 1.txt nobatch >> digits >> >> I wrote article: >> http://www.seeingwithsound.com/ocr.htm >> djpeg -grayscale -dither none -outfile 1.pnm 1.jpg && gocr -i 1.pnm -o >> 1.txt >> and results are better (no "." but digits are properly decoded , but 1 >> instead of 7 and S instead of 5) >> >> I tried to use http://www.onlineocr.net/ and results are impressive !!! >> I minimize picture to 176x144 , YUYV palette instead of MJPEG, brightes: >> 100%, 96x96dpi number of frames on 1 picture 50, results has "." in the >> right place, but problem with "7"->"1" and "5"->"S" >> >> I`ve question, which algoritm is uded by onlineocr.net?For example >> free-ocr.com , newocr.com has very bad results (same as tessearct fresh >> installed in debian) >> >> I think it is not only about OCR algorithm, but also about improving > (preprocessing) input image and OCR training. BTW: there is dedicated 7 > segment OCR[1]. > [1] http://www.unix-ag.uni-kl.de/~auerswal/ssocr/ > > >> I attache some pictures made by my logitech c120 (and negative copy). >> > > > tesseract 1.jpg 1 quiet && cat 1.txt > 35.3 > > ;-) > >> >> >> >> >> >> W dniu środa, 5 grudnia 2012 08:06:20 UTC+1 użytkownik Speedy napisał: >> >>> Just check out Ray's October 2007 paper "An Overview of the Tesseract >>> OCR Engine" where it says: >>> >>> The first step is >>> a connected component analysis in which outlines of >>> the components are stored. This was a computationally >>> expensive design decision at the time, but had a >>> significant advantage: by inspection of the nesting of >>> outlines, and the number of child and grandchild >>> outlines, it is simple to detect inverse text and >>> recognize it as easily as black-on-white text. Tesseract >>> was probably the first OCR engine able to handle >>> white-on-black text so trivially. >>> >>> And in fact, in our own application after image preprocessing we pass >>> the binarized image as a white-on-black image to tesseract and never had >>> problems with that. Of course, our training images are also white-on-black, >>> so this might also affect our findings. >>> >>> Marcus >>> >>> >>> On Tuesday, December 4, 2012 2:58:26 PM UTC+1, zdenop wrote: >>>> >>>> Where did you find "advertised features of tesseract is that it works >>>> equally well for black-on-white and white-on-black text"? I never heard >>>> about it. >>>> See forum for other experience: https://groups.** >>>> google.com/d/topic/tesseract-**ocr/XoX6t5Ih1IM/discussion<https://groups.google.com/d/topic/tesseract-ocr/XoX6t5Ih1IM/discussion> >>>> >>>> -- >>>> Zdenko >>>> >>>> On Tue, Dec 4, 2012 at 2:42 PM, Speedy <[email protected]> wrote: >>>> >>>>> Why is a black background a problem? One of the advertised features of >>>>> tesseract is that it works equally well for black-on-white and >>>>> white-on-black text. >>>> >>>> Marcus >>>>> >>>>> >>>>> On Tuesday, December 4, 2012 11:11:36 AM UTC+1, zdenop wrote: >>>>> >>>>>> Search forum. I remember discussion about **simi**lar topic. >>>>>> AFAIR: tesseract has problem with letter(symbol) that consists of >>>>>> several not connected parts (e.g. dots, lines) - solution should be to >>>>>> preprocess image (blur). >>>>>> >>>>>> Generally: black background is problem. Quality of image is too low >>>>>> (JPEG, quality: 75), there is no information about DPI... Anyway this >>>>>> "LED" >>>>>> font is not standard font, so maybe training will be need. >>>>>> >>>>>> -- >>>>>> Zdenko >>>>>> >>>>>> On Tue, Dec 4, 2012 at 12:43 AM, mike oldfield >>>>>> <[email protected]>wrote: >>>>>> >>>>>>> >>>>>>> <https://lh5.googleusercontent.com/-Ly6oR_Rmkag/UL04-iH5XaI/AAAAAAAAAAU/J-T592D8834/s1600/1.jpg> >>>>>>> Hello >>>>>>> >>>>>>> I`d like to recognize LED-like numbers/digits. >>>>>>> I attached image (jpg, 680x320, brightness 65%, contrast 100%). >>>>>>> Is there any libraries or presets to decode these digits? For >>>>>>> example googledocuments conversion and free-ocr.com doesn`t work. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To post to this group, send email to [email protected] >>>>>>> >>>>>>> To unsubscribe from this group, send email to >>>>>>> tesseract-oc...@**googlegroups.**com >>>>>>> >>>>>>> For more options, visit this group at >>>>>>> http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>>>>>> >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To post to this group, send email to [email protected] >>>>> To unsubscribe from this group, send email to >>>>> tesseract-oc...@googlegroups.**com >>>>> For more options, visit this group at >>>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>>>> >>>> >>>> >>>> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

