Hello all :-) I'm trying to improve Chinese recognize accuracy. I found my book has some characters that are not included in unicharset of chi_sim.traineddata( https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02.chi_sim.tar.gz&can=2&q= ). So I trained a parts of these characters and named it chi_ext.traineddata. And I initialize tesseract with "chi_sim+chi_ext". But tesseract seems to can't recognize new character even used in training. (Of course they are not same image. Trained image was scanned and recognized image was captured by cellphone camera. Captured images have lower resolution and thick characters) (It rarely recognize correctly. approximately, one tenth ) When I initialize tesseract with "chi_ext+chi_sim", tesseract only recognize all character as new characters. I'm thoroughly confused.
Are there some priorities according to in order? If they are, can I modify these priorities? Or not, how can I solve this problem? Sorry for my terrible English grammar. I will wait for your answer. thank you. good day. -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

