Re: Thoughts on having the training process take font files directly

Nick White Mon, 15 Oct 2012 06:22:40 -0700

On Fri, Oct 12, 2012 at 10:28:15AM -0700, Tom Morris wrote:
> Sorry, let me clarify.  I wasn't suggesting using scans, I was suggesting 
> using
> images created by taking representative texts, representative fonts, and
> rendering page images from them (which I suspect is what your current 
> automated
> training program does.)


It is, thank you for clarifying.

> Except that you have to understand not only the data, but how it interacts 
> with
> the font rasterization machinery.  If you just render the text, that's all
> taken care of for you.  Rendering images with different font sizes may be a
> good idea if that's representative of what you'll encounter in your real world
> images.
> 
> Perhaps it's possible to interpret the font information directly, but my
> suspicion is that you'll be introducing at least as many problems as you're
> solving.

Hmm, yes, OK, good point. Doing it right would be a tough challenge,
and add an extra source for bugs and other issues. Thanks for your
thoughts - I'll leave the idea then, unless I get an urge to get
down and dirty with freetype (which is rather unlikely ;))

Nick

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Thoughts on having the training process take font files directly

Reply via email to