How many is a few?
For me it sounds that you should train a bit more, maybe with a file
with mixed arrows and - - > together?
I'm training for a phonetical script, so I have quite many different,
and longer, signs to deal with. Even though my biggest problem is
exactly the opposite of yours, 'ga' is almost always recognized as a
'ea' with a bow underneath (which is a valid symbols elsewhere in the
text)
and also keep on getting the "box overlaps blob in labelled word"
failure. Don't know what to do with that..
But back to your problem, have you examined the arrows in the
pictures? I just might imagine there are some small pixels with a
lighter tone in the arrow tail that makes tesseract suppose they are
just two '-' tightly following each other..


On Nov 9, 1:13 pm, lab <[EMAIL PROTECTED]> wrote:
> Hi all, I'm trying to use tesseract to recognize some text
> interspersed with symbols. I've managed to train a new language as
> explained in the wiki, but I find that sometimes tesseract places the
> boxes incorrectly during recognition.
>
> Are there any parameters which control the box placement?
> I'd prefer user visible parameters but I don't mind hacking the code.
>
> For example, if I have a horizontal arrow, then this is sometimes
> split into three boxes like [-][-][>]. I'd like the algorithm to be
> more lenient and try to recognize the full arrow as a single
> character.
>
> I've trained a few samples with the correct box size, but
> it doesn't seem to help (ie tesseract still insists on splitting in
> its own way). Should I train with a lot more samples?
>
> All help appreciated.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to