Hi all, I'm trying to use tesseract to recognize some text interspersed with symbols. I've managed to train a new language as explained in the wiki, but I find that sometimes tesseract places the boxes incorrectly during recognition.
Are there any parameters which control the box placement? I'd prefer user visible parameters but I don't mind hacking the code. For example, if I have a horizontal arrow, then this is sometimes split into three boxes like [-][-][>]. I'd like the algorithm to be more lenient and try to recognize the full arrow as a single character. I've trained a few samples with the correct box size, but it doesn't seem to help (ie tesseract still insists on splitting in its own way). Should I train with a lot more samples? All help appreciated. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

