On Mon, Oct 15, 2012 at 3:48 AM, Nick White <[email protected]> wrote: > On Fri, Oct 12, 2012 at 10:28:15AM -0700, Tom Morris wrote: >> Sorry, let me clarify. I wasn't suggesting using scans, I was suggesting >> using >> images created by taking representative texts, representative fonts, and >> rendering page images from them (which I suspect is what your current >> automated >> training program does.) > > It is, thank you for clarifying.
As an added step, you could might consider: rendering to grayscale, slightly blurring (optional), adding a bit of noise, and then re-converting to b&w to simulate what physical scanners do? Maybe do this at 1200dpi and also downsample to 300 dpi. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

