Unfortunately this just trains incorrect outlines.The problem is that applybox doesn't do forced chopping of touching outlines, but it needs to. You need to render your training text with a small amount of inter-character spacing so that the samples don't touch in the first place. Ray.
On Thu, Mar 5, 2009 at 1:12 AM, Ondra <[email protected]> wrote: > > Hi, > > I'm new not know where to write. I'm followin procedure of training > tesseract for new language. > > While training Tess with Arial, boxing results in joining "vy" letters > pair into uknown character. > So I corercted box file, split this to 2 letters, but tesseract still > logs FATALITY about overlapping char resulting in different number of > classes in unicharset and training result...training was broken. > > I gone through code and I think there's a mistake in applybox.cpp > > on row 416 there is > > new_outline_it.add_to_end (outline_it.extract ()); > > should be > > OUTLINE* pout = outline_it.extract (); > pout->set_outline_box(box); > new_outline_it.add_to_end (pout); > > to preserve manually adjusted boxes. I'm not sure what impact will be > on other training sets, but this works, at least fatalities about > overlapping boxes dismissed. > > Am I right or no? > > Thankx in advance > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

