[tesseract-ocr] Re: Possible to prioritise some characters over others during OCR?

Ashish Goel Tue, 31 May 2016 22:02:10 -0700

I also wish to find a way to avoid such cases. Even I am facing some cases 
where I get extra white spaces, lower/upper case mismatch and wrong 
detection of characters...


On Tuesday, May 31, 2016 at 11:40:28 PM UTC+5:30, Diederik Hattingh wrote:
>
> I have a case where my tesseract isn't detecting URLs as expected. (More 
> details in my SO question 
> <http://stackoverflow.com/questions/37533524/tweak-tesseract-for-better-detection-of-urls-in-image>.)
>   
>
>
> The http:// part is being recognised as http:II.  If I specify a white 
> list of characters that doesn't include capital I tesseract recognizes the 
> string correctly.
>
> Is it possible for me to specify a priority of characters to recognize?
>
> Any other ideas on how to tweak the parameters to increase my accuracy?  
>
> <http://i.stack.imgur.com/jO1u9.png>
>
>
> Is incorrectly read as "http:II11111111111111111111111111111111111 
> 1111111111111111111.coml"
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b3e8194a-9735-459a-9119-58eff4d28fb3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Possible to prioritise some characters over others during OCR?

Reply via email to