Hi Thura, It looks like you're using 600 pixels per inch resolution. That may be too high res. Have you checked the instructions about character pixel height? You're right that they're similar, but that should not make such a big difference. You may be able to use character sequences and post-processing to yield better results. --Sven
On Thu, Dec 22, 2011 at 9:49 PM, Thura Hlaing <[email protected]> wrote: > Hello guys, I am trying to train tesseract (3.01) for Burmese script. I am > following exactly the guide, however I couldn't get a acceptable accuracy > rate (less than 50%). Although, Burmese script has only 33 letters > (consonants), there are a lot of consonant + diacritic combinations > (ligatures). So, I need to train more than 900 characters (glyphs). I have > generated a tiff image & box file, including 7 sample for infrequent > characters and 20 for frequent characters. > > Is it because the characters (glyphs) in training set are quite similar to > each other? (In Burmese, each consonant can has several ligatures - which > are quite similar to each other - combined with one or more diacritics.) > > I have attached my training image (converted to png from tif) & box file. > Any help or tip to improve accuracy is greatly appreciated. Thanks in > advance. > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

