I've had some issues where some letters are persistently misrecognized,
even though there's only a single character being evaluated, and even when
I used my own training data.  For example, sometimes B gets persistently
mistaken for X or K.  My guess is that it has something to do with how
Tesseract is generating the bounding box, but I don't know how to control
this, or how to show the bounding box graphically.  But if you know of a
way to manually specify the bounding box around your character, maybe that
might help?

On Tue, Jul 17, 2012 at 3:04 AM, Nick White <[email protected]> wrote:

> Hi Steve,
>
> On Thu, Jul 12, 2012 at 09:24:23AM -0700, Steve wrote:
> > Have you tried isolating just the letter and seeing if it is correctly
> > identified when you use single-character mode?
>
> Thanks for the thought. I hadn't considered that. I did isolate it
> and ran Tesseract with -psm 10, but recognition was still wrong,
> albeit this time in a new way that I haven't seen in my OCR work
> (replaced by a different character). I suppose this may be due to it
> finding different metrics or something?
>
> I also isolated the offending word and ran tesseract with -psm 8,
> which again produced another different character (still wrong). So
> it seems like it's certainly a tricky one.
>
> Any more thoughts? Thanks again for this.
>
> Nick
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to