Hi Nick, thanks for your suggestion. Yes I have read the wiki page you pointed to. As for the badcase image I uploaded here, all I can guess is that the blurring effect may bring disadvantage to the recognition process. So I have tried to sharpen the image first and then perform OCR, the result is still wrong.
By the way, do you think it will make the recognition process slower if I enable Chinese recognition? As you know, the character recognition process is a template matching process. Given an unknown, more templates means more candidates to match, which takes longer time. *"If you're able to crop out all but the text you care about before * *handing it to Tesseract to process, things will be much easier"* This is what I am thinking of either. Just that I have not figured how to quickly select out candidate patches. Richard. On Tuesday, February 18, 2014 2:30:38 AM UTC+8, Nick White wrote: > > Hi Richard, > > It sounds like you're doing the right things as far as using a > whitelist to configure the range of characters, and disabling the > dictionary. > > Beyond that, I'd strongly recommend you read the advice on this wiki > page to see if you can improve things further: > https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality > > If you're able to crop out all but the text you care about before > handing it to Tesseract to process, things will be much easier. If > not, you could try recognising the Chinese characters, and do > post-processing after recognition to remove them. You'd do that by > using something like '-l eng+chi_sim' on the command line (though > of course you'd have to abandon the whitelist). > > There are probably other possibilities, but those are what spring to > mind. > > I hope this helps, and do let us know how you get on. > > Nick > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

