I'm working on a project to automatically process scanned documents. These documents contain handwriting over the printed document that damages the OCR over printed blocks. It can appear as a signature over a name and job title. These handwritings are more rounded and thin than the printed background and easily recognizable by human reading, and they do not differ by color or any easy image process that I could think of.
Generally these blocks of text aren't even recognized as char boxes, so I don't train these blocks as these noises are not constant. A similar issue was discussed here but with no hint for a solution - http://stackoverflow.com/questions/8158182/removing-noise-from-document-images I was wondering if any of you had a similar case and can Leptonica / tesseract variables help improve the recognition of these chars. Thanks in advance, Manuel -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAOWoKCtz8L8R84HEjz90rG%2BsNL99TJ36yrtr_8MQWbLsaxFFJQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

