Is there any special treatment for handwritten characters? I tried some 
characters but got varied results. Usually the simple characters are 
detected accurately but compound characters can be totally off. For example

<https://lh5.googleusercontent.com/-D0bVlV_6mQ0/U5zvz9DddcI/AAAAAAAAAQw/aPIEeIoAWE4/s1600/6.jpg>

Is interpreted as two characters 青 and 争。But this is actually a relatively 
good case. For 

<https://lh6.googleusercontent.com/-yMkKjgGpwww/U5zwhY1kUFI/AAAAAAAAAQ4/dLWwN8nM6n8/s1600/12.jpg>

It is totally off, which interprets the character as three part from top to 
bottom, and the bottom is interpreted as the symbol ^.  The worst case is 

<https://lh3.googleusercontent.com/-CveCXByxnCc/U5zw9d9pTYI/AAAAAAAAARA/ced8N6VFXFc/s1600/7.jpg>

which is completely garbage output. 

In all my user cases, I need only detect a single Chinese character a time. 
My question is, what can I do to improve the accuracy of the recognition? 
Thanks




On Monday, November 12, 2012 4:45:18 PM UTC-5, sventech wrote:
>
> To get better results you will need to increase the contrast and add a 
> border. That image is very poor quality for text, Generally you'll want a 
> bitmap type image format like TIFF or PNG, not JPG (which is for pictures). 
> Read the FAQ for more info on preparing images for OCR, especially the part 
> about x-height.
>
> As far as I know, Google has not released the full training data, however 
> you can tell a lot by unpacking the language files.
> --Sven
>
>
> On Sun, Nov 4, 2012 at 8:00 PM, Rong Xiao <[email protected] <javascript:>> 
> wrote:
>
>>  
>> <https://lh3.googleusercontent.com/-gwRhWSanaHo/UJcdfs8hiSI/AAAAAAAAABQ/8jlKa2ZypFs/s1600/chi_test4.jpg>
>>
>>
>>
>>
>>
>>
>> such as this image.it 's not very complex.
>>
>> On Friday, November 2, 2012 10:03:00 PM UTC+8, sventech wrote:
>>
>>> Preprocessing can help. Give us some example images and we may be able 
>>> to help. 
>>> --Sven 
>>>
>>> On Fri, Nov 2, 2012 at 7:25 AM, Rong Xiao <[email protected]> wrote: 
>>> > hi,I have tried tesseract-ocr on chinese,but I found that it can do 
>>> well on 
>>> > only few fonts. I want to know what kind of fonts are included in 
>>> > chi_sim.traineddata? If I expect better accuracy, need I train it by 
>>> myself 
>>> > ? 
>>> > 
>>> > thanks 
>>> > 
>>> > -- 
>>> > You received this message because you are subscribed to the Google 
>>> > Groups "tesseract-ocr" group. 
>>> > To post to this group, send email to [email protected] 
>>> > To unsubscribe from this group, send email to 
>>> > [email protected] 
>>> > For more options, visit this group at 
>>> > http://groups.google.com/group/tesseract-ocr?hl=en 
>>>
>>>
>>>
>>> -- 
>>> ``All that is gold does not glitter, 
>>>   not all those who wander are lost; 
>>> the old that is strong does not wither, 
>>>   deep roots are not reached by the frost. 
>>> From the ashes a fire shall be woken, 
>>>   a light from the shadows shall spring; 
>>> renewed shall be blade that was broken, 
>>>   the crownless again shall be king.” 
>>>
>>  -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected] 
>> <javascript:>
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>
>
> -- 
> ``All that is gold does not glitter,
>   not all those who wander are lost;
> the old that is strong does not wither,
>   deep roots are not reached by the frost.
> From the ashes a fire shall be woken,
>   a light from the shadows shall spring;
> renewed shall be blade that was broken,
>   the crownless again shall be king.”
>  

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6217f1fe-ecf1-41a6-a697-1fc5f1f39209%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to