training and using multiple language problem.

Choi Thu, 29 Aug 2013 00:24:58 -0700

Hello all :-)
I'm trying to improve Chinese recognize accuracy.
I found my book has some characters that are not included in unicharset of 
chi_sim.traineddata(
https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02.chi_sim.tar.gz&can=2&q=
).
So I trained a parts of these characters and named it chi_ext.traineddata.
And I initialize tesseract with "chi_sim+chi_ext".
But tesseract seems to can't recognize new character even used in training.
(Of course they are not same image. Trained image was scanned and 
recognized image was captured by cellphone camera. Captured images 
have lower resolution and thick characters)
(It rarely recognize correctly.  approximately, one tenth )
When I initialize tesseract with "chi_ext+chi_sim", 
tesseract only  recognize all character as new characters.
I'm thoroughly confused.


Are there some priorities according to in order? 
If they are, can I modify these priorities?
Or not, how can I solve this problem?

Sorry for my terrible English grammar. 
I will wait for your answer. thank you.
good day.



-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

training and using multiple language problem.

Reply via email to