Unfortunately this just trains incorrect outlines.The problem is that
applybox doesn't do forced chopping of touching outlines, but it needs to.
You need to render your training text with a small amount of inter-character
spacing so that the samples don't touch in the first place.
Ray.

On Thu, Mar 5, 2009 at 1:12 AM, Ondra <[email protected]> wrote:

>
> Hi,
>
> I'm new not know where to write. I'm followin procedure of training
> tesseract for new language.
>
> While training Tess with Arial, boxing results in joining "vy" letters
> pair into uknown character.
> So I corercted box file, split this to 2 letters, but tesseract still
> logs FATALITY about overlapping char resulting in different number of
> classes in unicharset and training result...training was broken.
>
> I gone through code and I think there's a mistake in applybox.cpp
>
> on row 416 there is
>
>               new_outline_it.add_to_end (outline_it.extract ());
>
> should be
>
>                OUTLINE* pout = outline_it.extract ();
>                pout->set_outline_box(box);
>                new_outline_it.add_to_end (pout);
>
> to preserve manually adjusted boxes. I'm not sure what impact will be
> on other training sets, but this works, at least fatalities about
> overlapping boxes dismissed.
>
> Am I right or no?
>
> Thankx in advance
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to