Hi Richard,

It sounds like you're doing the right things as far as using a
whitelist to configure the range of characters, and disabling the
dictionary.

Beyond that, I'd strongly recommend you read the advice on this wiki
page to see if you can improve things further:
https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality

If you're able to crop out all but the text you care about before
handing it to Tesseract to process, things will be much easier. If
not, you could try recognising the Chinese characters, and do
post-processing after recognition to remove them. You'd do that by
using something like '-l eng+chi_sim' on the command line (though
of course you'd have to abandon the whitelist).

There are probably other possibilities, but those are what spring to
mind.

I hope this helps, and do let us know how you get on.

Nick

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to