Hi Tom, Thanks! Setting the threshold worked for me.
Much appreciated, Max On Thursday, September 10, 2015 at 3:04:59 PM UTC-4, Tom Morris wrote: > > On Wednesday, September 9, 2015 at 1:30:38 PM UTC-4, Max Heiber wrote: >> >> Here's an example where the Chinese characters are very large and clear, >> but Tesseract gets the wrong result. Could you advise on what image >> processing could help Tesseract's accuracy? >> > > What have you tried so far? > > I got the following with about 30 seconds of playing with an image editor: > > 爸爸说我 > > > It looks correct to me, but I don't read Chinese. > > > Basically I just thresholded to send anything that wasn't very white to be > completely black. I didn't even both inverting the white on black. > > > Tom > > >> >> Thanks for your help! >> >> >> >> On Monday, November 12, 2012 at 4:45:02 PM UTC-5, Sven Pedersen wrote: >>> >>> To get better results you will need to increase the contrast and add a >>> border. That image is very poor quality for text, Generally you'll want a >>> bitmap type image format like TIFF or PNG, not JPG (which is for pictures). >>> Read the FAQ for more info on preparing images for OCR, especially the part >>> about x-height. >>> >>> As far as I know, Google has not released the full training data, >>> however you can tell a lot by unpacking the language files. >>> --Sven >>> >>> >>> On Sun, Nov 4, 2012 at 8:00 PM, Rong Xiao <[email protected]> wrote: >>> >>>> >>>> <https://lh3.googleusercontent.com/-gwRhWSanaHo/UJcdfs8hiSI/AAAAAAAAABQ/8jlKa2ZypFs/s1600/chi_test4.jpg> >>>> >>>> >>>> >>>> >>>> >>>> >>>> such as this image.it 's not very complex. >>>> >>>> On Friday, November 2, 2012 10:03:00 PM UTC+8, sventech wrote: >>>> >>>>> Preprocessing can help. Give us some example images and we may be able >>>>> to help. >>>>> --Sven >>>>> >>>>> On Fri, Nov 2, 2012 at 7:25 AM, Rong Xiao <[email protected]> wrote: >>>>> > hi,I have tried tesseract-ocr on chinese,but I found that it can do >>>>> well on >>>>> > only few fonts. I want to know what kind of fonts are included in >>>>> > chi_sim.traineddata? If I expect better accuracy, need I train it by >>>>> myself >>>>> > ? >>>>> > >>>>> > thanks >>>>> >>>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b0b02811-7bd2-43b3-8709-886f9895240f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

