I've found a rather novel way of training tesseract that seems to yield better results than the standard training methods. It involves selecting several of the same characters on a page (for example, 20 capital 'A's) and then using a median blend layering method to create a 'probability cloud' for what the character looks like and training off that one image. From experience on different fonts and layouts it's improved accuracy by as much as 35% in certain situations.
I'd like to begin implementing it in a library but I need to know if there's a method in tesseract to go through and slice out all the characters without recognizing them. I basically want to go through, identify a character and have the character cut out and saved as a seperate image file that I can manipulate. I'm a *terrible* programmer so I haven't been able to crack this yet, can anyone help me out? -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.

