Re: Is there a way to train Tesseract to NOT output/recognize a character?

Tobias Sebring Thu, 07 Jun 2012 02:01:19 -0700

Thanks for your input on this issue. I will go down the recognize as
arbitrary character route and handle those characters after ocr in my code.


/Tobias

On Wed, Jun 6, 2012 at 6:48 AM, Sven Pedersen <[email protected]>wrote:

> Hi Tobias,
> In the form processing industry control characters are typically
> recognized and them discarded -- that allows better debugging and
> calibration than just ignoring them entirely.
> --Sven
>
> On Mon, Jun 4, 2012 at 11:51 AM, TobiasS <[email protected]> wrote:
> > Yes, but the issue with blacklist is that the control characters are
> > not part of the Unicode character set (or any character set - they are
> > symbols). If possible I would like to use a cleaner solution than to
> > recognize, map to an arbitrary character and then blacklist.
> >
> > On Jun 4, 6:08 pm, Debayan Banerjee <[email protected]> wrote:
> >> On 4 June 2012 20:35, TobiasS <[email protected]> wrote:
> >>
> >> > Hi,
> >>
> >> > Is it possible to train Tesseract to not output/recognize a character?
> >>
> >> Try Tesseract blacklist feature.
> >>
> >> --
> >> Debayan Banerjee
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Is there a way to train Tesseract to NOT output/recognize a character?

Reply via email to