I've had some issues where some letters are persistently misrecognized, even though there's only a single character being evaluated, and even when I used my own training data. For example, sometimes B gets persistently mistaken for X or K. My guess is that it has something to do with how Tesseract is generating the bounding box, but I don't know how to control this, or how to show the bounding box graphically. But if you know of a way to manually specify the bounding box around your character, maybe that might help?
On Tue, Jul 17, 2012 at 3:04 AM, Nick White <[email protected]> wrote: > Hi Steve, > > On Thu, Jul 12, 2012 at 09:24:23AM -0700, Steve wrote: > > Have you tried isolating just the letter and seeing if it is correctly > > identified when you use single-character mode? > > Thanks for the thought. I hadn't considered that. I did isolate it > and ran Tesseract with -psm 10, but recognition was still wrong, > albeit this time in a new way that I haven't seen in my OCR work > (replaced by a different character). I suppose this may be due to it > finding different metrics or something? > > I also isolated the offending word and ran tesseract with -psm 8, > which again produced another different character (still wrong). So > it seems like it's certainly a tricky one. > > Any more thoughts? Thanks again for this. > > Nick > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

