Is there any special treatment for handwritten characters? I tried some characters but got varied results. Usually the simple characters are detected accurately but compound characters can be totally off. For example
<https://lh5.googleusercontent.com/-D0bVlV_6mQ0/U5zvz9DddcI/AAAAAAAAAQw/aPIEeIoAWE4/s1600/6.jpg> Is interpreted as two characters 青 and 争。But this is actually a relatively good case. For <https://lh6.googleusercontent.com/-yMkKjgGpwww/U5zwhY1kUFI/AAAAAAAAAQ4/dLWwN8nM6n8/s1600/12.jpg> It is totally off, which interprets the character as three part from top to bottom, and the bottom is interpreted as the symbol ^. The worst case is <https://lh3.googleusercontent.com/-CveCXByxnCc/U5zw9d9pTYI/AAAAAAAAARA/ced8N6VFXFc/s1600/7.jpg> which is completely garbage output. In all my user cases, I need only detect a single Chinese character a time. My question is, what can I do to improve the accuracy of the recognition? Thanks On Monday, November 12, 2012 4:45:18 PM UTC-5, sventech wrote: > > To get better results you will need to increase the contrast and add a > border. That image is very poor quality for text, Generally you'll want a > bitmap type image format like TIFF or PNG, not JPG (which is for pictures). > Read the FAQ for more info on preparing images for OCR, especially the part > about x-height. > > As far as I know, Google has not released the full training data, however > you can tell a lot by unpacking the language files. > --Sven > > > On Sun, Nov 4, 2012 at 8:00 PM, Rong Xiao <[email protected] <javascript:>> > wrote: > >> >> <https://lh3.googleusercontent.com/-gwRhWSanaHo/UJcdfs8hiSI/AAAAAAAAABQ/8jlKa2ZypFs/s1600/chi_test4.jpg> >> >> >> >> >> >> >> such as this image.it 's not very complex. >> >> On Friday, November 2, 2012 10:03:00 PM UTC+8, sventech wrote: >> >>> Preprocessing can help. Give us some example images and we may be able >>> to help. >>> --Sven >>> >>> On Fri, Nov 2, 2012 at 7:25 AM, Rong Xiao <[email protected]> wrote: >>> > hi,I have tried tesseract-ocr on chinese,but I found that it can do >>> well on >>> > only few fonts. I want to know what kind of fonts are included in >>> > chi_sim.traineddata? If I expect better accuracy, need I train it by >>> myself >>> > ? >>> > >>> > thanks >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> > Groups "tesseract-ocr" group. >>> > To post to this group, send email to [email protected] >>> > To unsubscribe from this group, send email to >>> > [email protected] >>> > For more options, visit this group at >>> > http://groups.google.com/group/tesseract-ocr?hl=en >>> >>> >>> >>> -- >>> ``All that is gold does not glitter, >>> not all those who wander are lost; >>> the old that is strong does not wither, >>> deep roots are not reached by the frost. >>> From the ashes a fire shall be woken, >>> a light from the shadows shall spring; >>> renewed shall be blade that was broken, >>> the crownless again shall be king.” >>> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> <javascript:> >> To unsubscribe from this group, send email to >> [email protected] <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > > > -- > ``All that is gold does not glitter, > not all those who wander are lost; > the old that is strong does not wither, > deep roots are not reached by the frost. > From the ashes a fire shall be woken, > a light from the shadows shall spring; > renewed shall be blade that was broken, > the crownless again shall be king.” > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6217f1fe-ecf1-41a6-a697-1fc5f1f39209%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

