I applied some of the image processing that I commonly use to the image you sent. Before image processing, Tesseract outputs: The Evolving Student @
After processing, it outputs: The Evolving Student 0 Children and Email Classroom Requirem nts Online Coursework Dependency v Learning a Vital Social Skill (The missing e is due to the pre-processing, not tesseract). The main thing I notice about the image that you sent is that most of the letters have very low contrast with their surroundings. If you add some pre-processing to intelligently convert the image to black and white, I expect that your results will improve significantly. Derek On Feb 19, 2012, at 4:58 , Jason Funk wrote: > My specific examples are screen captures of powerpoint slides. For > example, what would need to be done to this image? > > http://jasonfunk.net/example2.jpeg > > On Feb 18, 6:03 pm, Sven Pedersen <sven.peder...@gmail.com> wrote: >> Image processing, not age. :-) >> >> >> >> >> >> >> >> >> >> On Saturday, February 18, 2012, Sven Pedersen wrote: >>> Commercial options have lots of built-in age processing. You can do that >>> with free software but it does not just do it automatically. Post examples >>> and you'll get feedback about how to do it with tesseract. >>> --Sven >> >>> On Saturday, February 18, 2012, Jason Funk wrote: >> >>>> But what if I am simply trying to do OCR on images that use standard >>>> normal english fonts? Why isn't it working as well as the commercial >>>> options which do beautifully? Does the default english language data >>>> file not contain a lot of the of typical fonts? >> >>>> On Feb 18, 3:53 pm, "La Monte H. P. Yarroll" <piggy.yarr...@gmail.com> >>>> wrote: >>>>> A good example is fraktur (old German black-letter fonts). The only >>>>> commercial option is over $10,000 for a single copy. There are some >>>>> languages for which tesseract is the only option. >> >>>>> On Sat, Feb 18, 2012 at 4:07 PM, Sven Pedersen <sven.peder...@gmail.com >>>>> wrote: >> >>>>>> Tesseract is especially good for custom training for a particular >>>> type of >>>>>> text. Accuracy can increase to over 98% for a given font. Also, it >>>> can be >>>>>> trained for foreign languages. >>>>>> --Sven >> >>>>>> On Sat, Feb 18, 2012 at 1:43 PM, Jason Funk <jasonlf...@gmail.com> >>>> wrote: >> >>>>>>> I am testing tesseract against some other commercial products and the >>>>>>> commercials products seems to blow tesseract out of the water in >>>> terms >>>>>>> of quality and accuracy. Is this because tesseract is just not as >>>> good >>>>>>> as the other products? Or perhaps tesseract is designed for a >>>> specific >>>>>>> purpose other than what I am testing it for? >> >>>>>>> Maybe a different question would be, for what applications are people >>>>>>> using tesseract successfully? >> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To post to this group, send email to tesseract-ocr@googlegroups.com >>>>>>> To unsubscribe from this group, send email to >>>>>>> tesseract-ocr+unsubscr...@googlegroups.com >>>>>>> For more options, visit this group at >>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en >> >>>>>> -- >>>>>> ``All that is gold does not glitter, >>>>>> not all those who wander are lost; >>>>>> the old that is strong does not wither, >>>>>> deep roots are not reached by the frost. >>>>>> From the ashes a fire shall be woken, >>>>>> a light from the shadows shall spring; >>>>>> renewed shall be blade that was broken, >>>>>> the crownless again shall be king.” >> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To post to this group, send email to tesseract-ocr@googlegroups.com >>>>>> To unsubscribe from this group, send email to >>>>>> tesseract-ocr+unsubscr...@googlegroups.com >>>>>> For more options, visit this group at >>>>>> http://groups.google.com/group/tesseract-ocr?hl=en >> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To post to this group, send email to tesseract-ocr@googlegroups.com >>>> To unsubscribe from this group, send email to >>>> tesseract-ocr+unsubscr...@googlegroups.com >>>> For more options, visit this group at >>>> http://groups.google.com/group/tesseract-ocr?hl=en >> >>> -- >>> ``All that is gold does not glitter, >>> not all those who wander are lost; >>> the old that is strong does not wither, >>> deep roots are not reached by the frost. >>> From the ashes a fire shall be woken, >>> a light from the shadows shall spring; >>> renewed shall be blade that was broken, >>> the crownless again shall be king.” >> >> -- >> ``All that is gold does not glitter, >> not all those who wander are lost; >> the old that is strong does not wither, >> deep roots are not reached by the frost. >> From the ashes a fire shall be woken, >> a light from the shadows shall spring; >> renewed shall be blade that was broken, >> the crownless again shall be king.” > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to tesseract-ocr@googlegroups.com > To unsubscribe from this group, send email to > tesseract-ocr+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en