Yes, I think you have covered the tweaks I thought of suggesting. Sven On Friday, September 30, 2011, Calomer <[email protected]> wrote: > Sven, > > Now I'm curious. What kind of tweaks are you talking about ? > > Appending old language training data with new fonts? > Pre-enhancement of the image (skew transformation on italic > characters, contract enhancement on low-contrast fonts etc) ? > > I'd love to know any other tweaks there is. > > Thanks > > On Sep 29, 10:39 pm, Sven Pedersen <[email protected]> wrote: >> Thanks Calomer. >> >> Bonny, is the language you're trying to improve using a different set >> of characters (alphabet)? If so, you'll need to do a lot of training >> as Calomer described. Otherwise you'll just need some tweaks. The font >> may be an issue. >> --Sven >> >> >> >> >> >> >> >> >> >> On Thu, Sep 29, 2011 at 12:39 PM, Calomer <[email protected]> wrote: >> > I'll try my best to answer, tho I'm hardly eligible. >> >> > According to training instructions (onhttp:// code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3) >> > and general OCR knowledge, you cannot train solely by new characters. >> > You need training images, you need to create boxes (with any box >> > editor, but I only used Qt Box Editor). Once you create new boxes >> > around your new tiff image, and label them accordingly, you should be >> > ready for training. >> >> > Keep in mind, you'll need at least 12 low x-height in pixels >> > (preferably around 20 pixels), variety in images would be nice for >> > increased performance. >> >> > Follow training instructions, train your own language file, try OCR >> > again, if you fail again, I'm sure someone else who has wider >> > knowledge than me should be able to answer your further questions. >> >> > On Sep 29, 2:44 pm, Bonny <[email protected]> wrote: >> >> Nobody know or the question is too silly? >> >> > -- >> > You received this message because you are subscribed to the Google >> > Groups "tesseract-ocr" group. >> > To post to this group, send email to [email protected] >> > To unsubscribe from this group, send email to >> > [email protected] >> > For more options, visit this group at >> >http://groups.google.com/group/tesseract-ocr?hl=en >> >> -- >> ``All that is gold does not glitter, >> not all those who wander are lost; >> the old that is strong does not wither, >> deep roots are not reached by the frost. >> From the ashes a fire shall be woken, >> a light from the shadows shall spring; >> renewed shall be blade that was broken, >> the crownless again shall be king.” > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en >
-- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

