I've found a rather novel way of training tesseract that seems to yield 
better results than the standard training methods. It involves selecting 
several of the same characters on a page (for example, 20 capital 'A's) and 
then using a median blend layering method to create a 'probability cloud' 
for what the character looks like and training off that one image. From 
experience on different fonts and layouts it's improved accuracy by as much 
as 35% in certain situations.

I'd like to begin implementing it in a library but I need to know if 
there's a method in tesseract to go through and slice out all the 
characters without recognizing them. I basically want to go through, 
identify a character and have the character cut out and saved as a seperate 
image file that I can manipulate. I'm a *terrible* programmer so I haven't 
been able to crack this yet, can anyone help me out?

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to