Hi,

I'm new not know where to write. I'm followin procedure of training
tesseract for new language.

While training Tess with Arial, boxing results in joining "vy" letters
pair into uknown character.
So I corercted box file, split this to 2 letters, but tesseract still
logs FATALITY about overlapping char resulting in different number of
classes in unicharset and training result...training was broken.

I gone through code and I think there's a mistake in applybox.cpp

on row 416 there is

               new_outline_it.add_to_end (outline_it.extract ());

should be

                OUTLINE* pout = outline_it.extract ();
                pout->set_outline_box(box);
                new_outline_it.add_to_end (pout);

to preserve manually adjusted boxes. I'm not sure what impact will be
on other training sets, but this works, at least fatalities about
overlapping boxes dismissed.

Am I right or no?

Thankx in advance

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to