Thanks for your input on this issue. I will go down the recognize as arbitrary character route and handle those characters after ocr in my code.
/Tobias On Wed, Jun 6, 2012 at 6:48 AM, Sven Pedersen <[email protected]>wrote: > Hi Tobias, > In the form processing industry control characters are typically > recognized and them discarded -- that allows better debugging and > calibration than just ignoring them entirely. > --Sven > > On Mon, Jun 4, 2012 at 11:51 AM, TobiasS <[email protected]> wrote: > > Yes, but the issue with blacklist is that the control characters are > > not part of the Unicode character set (or any character set - they are > > symbols). If possible I would like to use a cleaner solution than to > > recognize, map to an arbitrary character and then blacklist. > > > > On Jun 4, 6:08 pm, Debayan Banerjee <[email protected]> wrote: > >> On 4 June 2012 20:35, TobiasS <[email protected]> wrote: > >> > >> > Hi, > >> > >> > Is it possible to train Tesseract to not output/recognize a character? > >> > >> Try Tesseract blacklist feature. > >> > >> -- > >> Debayan Banerjee > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

