Hi Tobias,
In the form processing industry control characters are typically
recognized and them discarded -- that allows better debugging and
calibration than just ignoring them entirely.
--Sven

On Mon, Jun 4, 2012 at 11:51 AM, TobiasS <[email protected]> wrote:
> Yes, but the issue with blacklist is that the control characters are
> not part of the Unicode character set (or any character set - they are
> symbols). If possible I would like to use a cleaner solution than to
> recognize, map to an arbitrary character and then blacklist.
>
> On Jun 4, 6:08 pm, Debayan Banerjee <[email protected]> wrote:
>> On 4 June 2012 20:35, TobiasS <[email protected]> wrote:
>>
>> > Hi,
>>
>> > Is it possible to train Tesseract to not output/recognize a character?
>>
>> Try Tesseract blacklist feature.
>>
>> --
>> Debayan Banerjee

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to