Tesseract-ocr has still problem with decoding of LED-like digits. I made something like this in my squeeze comand line: convert 1.jpg 1.tif && tesseract 1.tif 1.txt nobatch digits ...but effects are very poor and far very far from the truth ;) the same siyuation is when I make negativ (black digits on white background): convert 1.jpg -negate 1-negativ.jpg && tesseract 1.tif 1.txt nobatch digits
I wrote article: http://www.seeingwithsound.com/ocr.htm djpeg -grayscale -dither none -outfile 1.pnm 1.jpg && gocr -i 1.pnm -o 1.txt and results are better (no "." but digits are properly decoded , but 1 instead of 7 and S instead of 5) I tried to use http://www.onlineocr.net/ and results are impressive !!! I minimize picture to 176x144 , YUYV palette instead of MJPEG, brightes: 100%, 96x96dpi number of frames on 1 picture 50, results has "." in the right place, but problem with "7"->"1" and "5"->"S" I`ve question, which algoritm is uded by onlineocr.net?For example free-ocr.com , newocr.com has very bad results (same as tessearct fresh installed in debian) I attache some pictures made by my logitech c120 (and negative copy). W dniu środa, 5 grudnia 2012 08:06:20 UTC+1 użytkownik Speedy napisał: > > Just check out Ray's October 2007 paper "An Overview of the Tesseract OCR > Engine" where it says: > > The first step is > a connected component analysis in which outlines of > the components are stored. This was a computationally > expensive design decision at the time, but had a > significant advantage: by inspection of the nesting of > outlines, and the number of child and grandchild > outlines, it is simple to detect inverse text and > recognize it as easily as black-on-white text. Tesseract > was probably the first OCR engine able to handle > white-on-black text so trivially. > > And in fact, in our own application after image preprocessing we pass the > binarized image as a white-on-black image to tesseract and never had > problems with that. Of course, our training images are also white-on-black, > so this might also affect our findings. > > Marcus > > > On Tuesday, December 4, 2012 2:58:26 PM UTC+1, zdenop wrote: >> >> Where did you find "advertised features of tesseract is that it works >> equally well for black-on-white and white-on-black text"? I never heard >> about it. >> See forum for other experience: >> https://groups.google.com/d/topic/tesseract-ocr/XoX6t5Ih1IM/discussion >> >> -- >> Zdenko >> >> On Tue, Dec 4, 2012 at 2:42 PM, Speedy <[email protected]> wrote: >> >>> Why is a black background a problem? One of the advertised features of >>> tesseract is that it works equally well for black-on-white and >>> white-on-black text. >> >> Marcus >>> >>> >>> On Tuesday, December 4, 2012 11:11:36 AM UTC+1, zdenop wrote: >>> >>>> Search forum. I remember discussion about **similar topic. >>>> AFAIR: tesseract has problem with letter(symbol) that consists of >>>> several not connected parts (e.g. dots, lines) - solution should be to >>>> preprocess image (blur). >>>> >>>> Generally: black background is problem. Quality of image is too low >>>> (JPEG, quality: 75), there is no information about DPI... Anyway this >>>> "LED" >>>> font is not standard font, so maybe training will be need. >>>> >>>> -- >>>> Zdenko >>>> >>>> On Tue, Dec 4, 2012 at 12:43 AM, mike oldfield <[email protected]>wrote: >>>> >>>>> >>>>> <https://lh5.googleusercontent.com/-Ly6oR_Rmkag/UL04-iH5XaI/AAAAAAAAAAU/J-T592D8834/s1600/1.jpg> >>>>> Hello >>>>> >>>>> I`d like to recognize LED-like numbers/digits. >>>>> I attached image (jpg, 680x320, brightness 65%, contrast 100%). >>>>> Is there any libraries or presets to decode these digits? For example >>>>> googledocuments conversion and free-ocr.com doesn`t work. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To post to this group, send email to [email protected] >>>>> >>>>> To unsubscribe from this group, send email to >>>>> tesseract-oc...@**googlegroups.com >>>>> >>>>> For more options, visit this group at >>>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> [email protected] >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >> >> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en
<<attachment: 1.jpg>>
<<attachment: 1-n.jpg>>

