Thanks. I already stated that eg. english training set has significant
spacing and no FATALITIEs are reported due to invalid box overlap.

Ondra



On Mar 10, 5:24 am, Ray Smith <[email protected]> wrote:
> Unfortunately this just trains incorrect outlines.The problem is that
> applybox doesn't do forced chopping of touching outlines, but it needs to.
> You need to render your training text with a small amount of inter-character
> spacing so that the samples don't touch in the first place.
> Ray.
>
> On Thu, Mar 5, 2009 at 1:12 AM, Ondra <[email protected]> wrote:
>
> > Hi,
>
> > I'm new not know where to write. I'm followin procedure of training
> > tesseract for new language.
>
> > While training Tess with Arial, boxing results in joining "vy" letters
> > pair into uknown character.
> > So I corercted box file, split this to 2 letters, but tesseract still
> > logs FATALITY about overlapping char resulting in different number of
> > classes in unicharset and training result...training was broken.
>
> > I gone through code and I think there's a mistake in applybox.cpp
>
> > on row 416 there is
>
> >               new_outline_it.add_to_end (outline_it.extract ());
>
> > should be
>
> >                OUTLINE* pout = outline_it.extract ();
> >                pout->set_outline_box(box);
> >                new_outline_it.add_to_end (pout);
>
> > to preserve manually adjusted boxes. I'm not sure what impact will be
> > on other training sets, but this works, at least fatalities about
> > overlapping boxes dismissed.
>
> > Am I right or no?
>
> > Thankx in advance
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to