[tesseract-ocr] Removing handwriting from printed documents

Manuel Le Normand Wed, 30 Jul 2014 00:47:33 -0700

I'm working on a project to automatically process scanned documents. These
documents contain handwriting over the printed document that damages the
OCR over printed blocks. It can appear as a signature over a name and job
title. These handwritings are more rounded and thin than the printed
background and easily recognizable by human reading, and they do not differ
by color or any easy image process that I could think of.


Generally these blocks of text aren't even recognized as char boxes, so I
don't train these blocks as these noises are not constant.
A similar issue was discussed here but with no hint for a solution -
http://stackoverflow.com/questions/8158182/removing-noise-from-document-images

I was wondering if any of you had a similar case and can Leptonica /
tesseract variables help improve the recognition of these chars.

Thanks in advance,
Manuel

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAOWoKCtz8L8R84HEjz90rG%2BsNL99TJ36yrtr_8MQWbLsaxFFJQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Removing handwriting from printed documents

Reply via email to