In this case we mean the type of special delimiter symbol characters
you find at the bottom of a check or form. They allow systems to tell
that the document is aligned correctly in the feed or to calibrate
distances -- you find them in MICR fonts
(http://en.wikipedia.org/wiki/Magnetic_ink_character_recognition) such
as E13-B or OCR-B.
--Sven

On Wed, Jun 6, 2012 at 12:04 PM, La Monte H. P. Yarroll
<[email protected]> wrote:
> Am I the only one wondering what a printable control character might look
> like? To me "control character" is a thing like carriage return or form feed
> which doesn't have a printable representation.
>
> On Wed, Jun 6, 2012 at 12:48 AM, Sven Pedersen <[email protected]>
> wrote:
>>
>> Hi Tobias,
>> In the form processing industry control characters are typically
>> recognized and them discarded -- that allows better debugging and
>> calibration than just ignoring them entirely.
>> --Sven
>>
>> On Mon, Jun 4, 2012 at 11:51 AM, TobiasS <[email protected]> wrote:
>> > Yes, but the issue with blacklist is that the control characters are
>> > not part of the Unicode character set (or any character set - they are
>> > symbols). If possible I would like to use a cleaner solution than to
>> > recognize, map to an arbitrary character and then blacklist.
>> >
>> > On Jun 4, 6:08 pm, Debayan Banerjee <[email protected]> wrote:
>> >> On 4 June 2012 20:35, TobiasS <[email protected]> wrote:
>> >>
>> >> > Hi,
>> >>
>> >> > Is it possible to train Tesseract to not output/recognize a
>> >> > character?
>> >>
>> >> Try Tesseract blacklist feature.
>> >>
>> >> --
>> >> Debayan Banerjee

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to