Controlling box boundaries?

lab Sun, 09 Nov 2008 04:13:17 -0800

Hi all, I'm trying to use tesseract to recognize some text
interspersed with symbols. I've managed to train a new language as
explained in the wiki, but I find that sometimes tesseract places the
boxes incorrectly during recognition.


Are there any parameters which control the box placement?
I'd prefer user visible parameters but I don't mind hacking the code.

For example, if I have a horizontal arrow, then this is sometimes
split into three boxes like [-][-][>]. I'd like the algorithm to be
more lenient and try to recognize the full arrow as a single
character.

I've trained a few samples with the correct box size, but
it doesn't seem to help (ie tesseract still insists on splitting in
its own way). Should I train with a lot more samples?

All help appreciated.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Controlling box boundaries?

Reply via email to