[tesseract-ocr] Re: tesseract-ocr does not very well on chinese

Max Heiber Sun, 13 Sep 2015 18:29:22 -0700

Hi Tom,

Thanks! Setting the threshold worked for me.


Much appreciated,

Max

On Thursday, September 10, 2015 at 3:04:59 PM UTC-4, Tom Morris wrote:
>
> On Wednesday, September 9, 2015 at 1:30:38 PM UTC-4, Max Heiber wrote:
>>
>> Here's an example where the Chinese characters are very large and clear, 
>> but Tesseract gets the wrong result. Could you advise on what image 
>> processing could help Tesseract's accuracy?
>>
>
> What have you tried so far?
>
> I got the following with about 30 seconds of playing with an image editor:
>
> 爸爸说我
>
>
> It looks correct to me, but I don't read Chinese.
>
>
> Basically I just thresholded to send anything that wasn't very white to be 
> completely black.  I didn't even both inverting the white on black.
>
>
> Tom
>  
>
>>
>> Thanks for your help!
>>
>>
>>
>> On Monday, November 12, 2012 at 4:45:02 PM UTC-5, Sven Pedersen wrote:
>>>
>>> To get better results you will need to increase the contrast and add a 
>>> border. That image is very poor quality for text, Generally you'll want a 
>>> bitmap type image format like TIFF or PNG, not JPG (which is for pictures). 
>>> Read the FAQ for more info on preparing images for OCR, especially the part 
>>> about x-height.
>>>
>>> As far as I know, Google has not released the full training data, 
>>> however you can tell a lot by unpacking the language files.
>>> --Sven
>>>
>>>
>>> On Sun, Nov 4, 2012 at 8:00 PM, Rong Xiao <[email protected]> wrote:
>>>
>>>>
>>>> <https://lh3.googleusercontent.com/-gwRhWSanaHo/UJcdfs8hiSI/AAAAAAAAABQ/8jlKa2ZypFs/s1600/chi_test4.jpg>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> such as this image.it 's not very complex.
>>>>
>>>> On Friday, November 2, 2012 10:03:00 PM UTC+8, sventech wrote:
>>>>
>>>>> Preprocessing can help. Give us some example images and we may be able 
>>>>> to help. 
>>>>> --Sven 
>>>>>
>>>>> On Fri, Nov 2, 2012 at 7:25 AM, Rong Xiao <[email protected]> wrote: 
>>>>> > hi,I have tried tesseract-ocr on chinese,but I found that it can do 
>>>>> well on 
>>>>> > only few fonts. I want to know what kind of fonts are included in 
>>>>> > chi_sim.traineddata? If I expect better accuracy, need I train it by 
>>>>> myself 
>>>>> > ? 
>>>>> > 
>>>>> > thanks 
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b0b02811-7bd2-43b3-8709-886f9895240f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: tesseract-ocr does not very well on chinese

Reply via email to