On 20 June 2010 11:22, Sriranga(77yrsold) <[email protected]> wrote: > Thilanka, > which lang you know well. Since new version 3.0( r=400) has Chinese > Lang.trained data. You are > aware that chinese script has number of strokes and being such a case, I > firmly believed that English handwriting can be trained easily and > successfully. please forward sample handwritten to my email address = > [email protected]. I shall try myself and feedback to you..
No. The issue of Chinese and handwriting are completely different. With Chinese, the issue is that of a large character set; with handwriting - that is, of handwritten printed characters, not cursive - it's the wide amount of variation. Write the same sentence 10 times, then look at the page - no two characters will be exactly alike (think of this as training on multiple examples from the same font - you have to learn the variations). On top of that, handwriting is 'unique'; each person's handwriting should be thought of in terms of different fonts - and there's no way to train for that. You may have some luck, but don't be surprised if the results are dramatically less accurate than for printed text. Cursive writing has its own set of issues - in particular, character segmentation of joined letters. Tesseract has no support for this type of segmentation - it has problems with in training from regular printed pages, when there is not enough space between the characters. (Sriranga, you have encountered this limitation a number of times, if the issue tracker is anything to go by). In summary: For a single person, with printed characters: you might be lucky. For multiple people, with printed characters: don't have high expectations. For cursive: expect close to nothing. -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

