Hi Richard, It sounds like you're doing the right things as far as using a whitelist to configure the range of characters, and disabling the dictionary.
Beyond that, I'd strongly recommend you read the advice on this wiki page to see if you can improve things further: https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality If you're able to crop out all but the text you care about before handing it to Tesseract to process, things will be much easier. If not, you could try recognising the Chinese characters, and do post-processing after recognition to remove them. You'd do that by using something like '-l eng+chi_sim' on the command line (though of course you'd have to abandon the whitelist). There are probably other possibilities, but those are what spring to mind. I hope this helps, and do let us know how you get on. Nick -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

