To get better results you will need to increase the contrast and add a border. That image is very poor quality for text, Generally you'll want a bitmap type image format like TIFF or PNG, not JPG (which is for pictures). Read the FAQ for more info on preparing images for OCR, especially the part about x-height.
As far as I know, Google has not released the full training data, however you can tell a lot by unpacking the language files. --Sven On Sun, Nov 4, 2012 at 8:00 PM, Rong Xiao <[email protected]> wrote: > > <https://lh3.googleusercontent.com/-gwRhWSanaHo/UJcdfs8hiSI/AAAAAAAAABQ/8jlKa2ZypFs/s1600/chi_test4.jpg> > > > > > > > such as this image.it 's not very complex. > > On Friday, November 2, 2012 10:03:00 PM UTC+8, sventech wrote: > >> Preprocessing can help. Give us some example images and we may be able to >> help. >> --Sven >> >> On Fri, Nov 2, 2012 at 7:25 AM, Rong Xiao <[email protected]> wrote: >> > hi,I have tried tesseract-ocr on chinese,but I found that it can do >> well on >> > only few fonts. I want to know what kind of fonts are included in >> > chi_sim.traineddata? If I expect better accuracy, need I train it by >> myself >> > ? >> > >> > thanks >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups "tesseract-ocr" group. >> > To post to this group, send email to [email protected] >> > To unsubscribe from this group, send email to >> > tesseract-oc...@**googlegroups.com >> > For more options, visit this group at >> > http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >> >> >> >> -- >> ``All that is gold does not glitter, >> not all those who wander are lost; >> the old that is strong does not wither, >> deep roots are not reached by the frost. >> From the ashes a fire shall be woken, >> a light from the shadows shall spring; >> renewed shall be blade that was broken, >> the crownless again shall be king.” >> > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

