In this case we mean the type of special delimiter symbol characters you find at the bottom of a check or form. They allow systems to tell that the document is aligned correctly in the feed or to calibrate distances -- you find them in MICR fonts (http://en.wikipedia.org/wiki/Magnetic_ink_character_recognition) such as E13-B or OCR-B. --Sven
On Wed, Jun 6, 2012 at 12:04 PM, La Monte H. P. Yarroll <[email protected]> wrote: > Am I the only one wondering what a printable control character might look > like? To me "control character" is a thing like carriage return or form feed > which doesn't have a printable representation. > > On Wed, Jun 6, 2012 at 12:48 AM, Sven Pedersen <[email protected]> > wrote: >> >> Hi Tobias, >> In the form processing industry control characters are typically >> recognized and them discarded -- that allows better debugging and >> calibration than just ignoring them entirely. >> --Sven >> >> On Mon, Jun 4, 2012 at 11:51 AM, TobiasS <[email protected]> wrote: >> > Yes, but the issue with blacklist is that the control characters are >> > not part of the Unicode character set (or any character set - they are >> > symbols). If possible I would like to use a cleaner solution than to >> > recognize, map to an arbitrary character and then blacklist. >> > >> > On Jun 4, 6:08 pm, Debayan Banerjee <[email protected]> wrote: >> >> On 4 June 2012 20:35, TobiasS <[email protected]> wrote: >> >> >> >> > Hi, >> >> >> >> > Is it possible to train Tesseract to not output/recognize a >> >> > character? >> >> >> >> Try Tesseract blacklist feature. >> >> >> >> -- >> >> Debayan Banerjee -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

