Hi Tobias, In the form processing industry control characters are typically recognized and them discarded -- that allows better debugging and calibration than just ignoring them entirely. --Sven
On Mon, Jun 4, 2012 at 11:51 AM, TobiasS <[email protected]> wrote: > Yes, but the issue with blacklist is that the control characters are > not part of the Unicode character set (or any character set - they are > symbols). If possible I would like to use a cleaner solution than to > recognize, map to an arbitrary character and then blacklist. > > On Jun 4, 6:08 pm, Debayan Banerjee <[email protected]> wrote: >> On 4 June 2012 20:35, TobiasS <[email protected]> wrote: >> >> > Hi, >> >> > Is it possible to train Tesseract to not output/recognize a character? >> >> Try Tesseract blacklist feature. >> >> -- >> Debayan Banerjee -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

