Re: OCR on a small set of char but with other letters I do not care

Richard Wang Mon, 17 Feb 2014 17:21:06 -0800

Hi Nick,

thanks for your suggestion. Yes I have read the wiki page you pointed to.
As for the badcase image I uploaded here, all I can guess is that the 
blurring
effect may bring disadvantage to the recognition process. So I have tried 
to sharpen
the image first and then perform OCR, the result is still wrong.


By the way, do you think it will make the recognition process slower if I 
enable
Chinese recognition? As you know, the character recognition process is a 
template matching process. Given an unknown, more templates means more
candidates to match, which takes longer time.

*"If you're able to crop out all but the text you care about before *
*handing it to Tesseract to process, things will be much easier"*

This is what I am thinking of either. Just that I have not figured how
to quickly select out candidate patches.

Richard.


On Tuesday, February 18, 2014 2:30:38 AM UTC+8, Nick White wrote:
>
> Hi Richard, 
>
> It sounds like you're doing the right things as far as using a 
> whitelist to configure the range of characters, and disabling the 
> dictionary. 
>
> Beyond that, I'd strongly recommend you read the advice on this wiki 
> page to see if you can improve things further: 
> https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality 
>
> If you're able to crop out all but the text you care about before 
> handing it to Tesseract to process, things will be much easier. If 
> not, you could try recognising the Chinese characters, and do 
> post-processing after recognition to remove them. You'd do that by 
> using something like '-l eng+chi_sim' on the command line (though 
> of course you'd have to abandon the whitelist). 
>
> There are probably other possibilities, but those are what spring to 
> mind. 
>
> I hope this helps, and do let us know how you get on. 
>
> Nick 
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: OCR on a small set of char but with other letters I do not care

Reply via email to