Sven, Now I'm curious. What kind of tweaks are you talking about ?
Appending old language training data with new fonts? Pre-enhancement of the image (skew transformation on italic characters, contract enhancement on low-contrast fonts etc) ? I'd love to know any other tweaks there is. Thanks On Sep 29, 10:39 pm, Sven Pedersen <[email protected]> wrote: > Thanks Calomer. > > Bonny, is the language you're trying to improve using a different set > of characters (alphabet)? If so, you'll need to do a lot of training > as Calomer described. Otherwise you'll just need some tweaks. The font > may be an issue. > --Sven > > > > > > > > > > On Thu, Sep 29, 2011 at 12:39 PM, Calomer <[email protected]> wrote: > > I'll try my best to answer, tho I'm hardly eligible. > > > According to training instructions > > (onhttp://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3) > > and general OCR knowledge, you cannot train solely by new characters. > > You need training images, you need to create boxes (with any box > > editor, but I only used Qt Box Editor). Once you create new boxes > > around your new tiff image, and label them accordingly, you should be > > ready for training. > > > Keep in mind, you'll need at least 12 low x-height in pixels > > (preferably around 20 pixels), variety in images would be nice for > > increased performance. > > > Follow training instructions, train your own language file, try OCR > > again, if you fail again, I'm sure someone else who has wider > > knowledge than me should be able to answer your further questions. > > > On Sep 29, 2:44 pm, Bonny <[email protected]> wrote: > >> Nobody know or the question is too silly? > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > >http://groups.google.com/group/tesseract-ocr?hl=en > > -- > ``All that is gold does not glitter, > not all those who wander are lost; > the old that is strong does not wither, > deep roots are not reached by the frost. > From the ashes a fire shall be woken, > a light from the shadows shall spring; > renewed shall be blade that was broken, > the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

