On Fri, Oct 12, 2012 at 10:28:15AM -0700, Tom Morris wrote: > Sorry, let me clarify. I wasn't suggesting using scans, I was suggesting > using > images created by taking representative texts, representative fonts, and > rendering page images from them (which I suspect is what your current > automated > training program does.)
It is, thank you for clarifying. > Except that you have to understand not only the data, but how it interacts > with > the font rasterization machinery. If you just render the text, that's all > taken care of for you. Rendering images with different font sizes may be a > good idea if that's representative of what you'll encounter in your real world > images. > > Perhaps it's possible to interpret the font information directly, but my > suspicion is that you'll be introducing at least as many problems as you're > solving. Hmm, yes, OK, good point. Doing it right would be a tough challenge, and add an extra source for bugs and other issues. Thanks for your thoughts - I'll leave the idea then, unless I get an urge to get down and dirty with freetype (which is rather unlikely ;)) Nick -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

