If your input were spaced enough, v and y wouldn't touch!See the tif files
on the downloads page for examples.
Ray.

On Tue, Mar 10, 2009 at 10:21 PM, Ondra <[email protected]> wrote:

>
> Thanks. I already stated that eg. english training set has significant
> spacing and no FATALITIEs are reported due to invalid box overlap.
>
> Ondra
>
>
>
> On Mar 10, 5:24 am, Ray Smith <[email protected]> wrote:
> > Unfortunately this just trains incorrect outlines.The problem is that
> > applybox doesn't do forced chopping of touching outlines, but it needs
> to.
> > You need to render your training text with a small amount of
> inter-character
> > spacing so that the samples don't touch in the first place.
> > Ray.
> >
> > On Thu, Mar 5, 2009 at 1:12 AM, Ondra <[email protected]> wrote:
> >
> > > Hi,
> >
> > > I'm new not know where to write. I'm followin procedure of training
> > > tesseract for new language.
> >
> > > While training Tess with Arial, boxing results in joining "vy" letters
> > > pair into uknown character.
> > > So I corercted box file, split this to 2 letters, but tesseract still
> > > logs FATALITY about overlapping char resulting in different number of
> > > classes in unicharset and training result...training was broken.
> >
> > > I gone through code and I think there's a mistake in applybox.cpp
> >
> > > on row 416 there is
> >
> > >               new_outline_it.add_to_end (outline_it.extract ());
> >
> > > should be
> >
> > >                OUTLINE* pout = outline_it.extract ();
> > >                pout->set_outline_box(box);
> > >                new_outline_it.add_to_end (pout);
> >
> > > to preserve manually adjusted boxes. I'm not sure what impact will be
> > > on other training sets, but this works, at least fatalities about
> > > overlapping boxes dismissed.
> >
> > > Am I right or no?
> >
> > > Thankx in advance
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to