On Mon, Oct 15, 2012 at 3:48 AM, Nick White <[email protected]> wrote:
> On Fri, Oct 12, 2012 at 10:28:15AM -0700, Tom Morris wrote:
>> Sorry, let me clarify.  I wasn't suggesting using scans, I was suggesting 
>> using
>> images created by taking representative texts, representative fonts, and
>> rendering page images from them (which I suspect is what your current 
>> automated
>> training program does.)
>
> It is, thank you for clarifying.

As an added step, you could might consider: rendering to grayscale,
slightly blurring (optional), adding a bit of noise, and then
re-converting to b&w to simulate what physical scanners do?  Maybe do
this at 1200dpi and also downsample to 300 dpi.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to