If your input were spaced enough, v and y wouldn't touch!See the tif files on the downloads page for examples. Ray.
On Tue, Mar 10, 2009 at 10:21 PM, Ondra <[email protected]> wrote: > > Thanks. I already stated that eg. english training set has significant > spacing and no FATALITIEs are reported due to invalid box overlap. > > Ondra > > > > On Mar 10, 5:24 am, Ray Smith <[email protected]> wrote: > > Unfortunately this just trains incorrect outlines.The problem is that > > applybox doesn't do forced chopping of touching outlines, but it needs > to. > > You need to render your training text with a small amount of > inter-character > > spacing so that the samples don't touch in the first place. > > Ray. > > > > On Thu, Mar 5, 2009 at 1:12 AM, Ondra <[email protected]> wrote: > > > > > Hi, > > > > > I'm new not know where to write. I'm followin procedure of training > > > tesseract for new language. > > > > > While training Tess with Arial, boxing results in joining "vy" letters > > > pair into uknown character. > > > So I corercted box file, split this to 2 letters, but tesseract still > > > logs FATALITY about overlapping char resulting in different number of > > > classes in unicharset and training result...training was broken. > > > > > I gone through code and I think there's a mistake in applybox.cpp > > > > > on row 416 there is > > > > > new_outline_it.add_to_end (outline_it.extract ()); > > > > > should be > > > > > OUTLINE* pout = outline_it.extract (); > > > pout->set_outline_box(box); > > > new_outline_it.add_to_end (pout); > > > > > to preserve manually adjusted boxes. I'm not sure what impact will be > > > on other training sets, but this works, at least fatalities about > > > overlapping boxes dismissed. > > > > > Am I right or no? > > > > > Thankx in advance > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

