[tesseract-ocr] Possible to prioritise some characters over others during OCR?

Diederik Hattingh Tue, 31 May 2016 11:11:44 -0700

I have a case where my tesseract isn't detecting URLs as expected. (More 
details in my SO question 
<http://stackoverflow.com/questions/37533524/tweak-tesseract-for-better-detection-of-urls-in-image>.)



The http:// part is being recognised as http:II.  If I specify a white list 
of characters that doesn't include capital I tesseract recognizes the 
string correctly.

Is it possible for me to specify a priority of characters to recognize?

Any other ideas on how to tweak the parameters to increase my accuracy?  

<http://i.stack.imgur.com/jO1u9.png>


Is incorrectly read as "http:II11111111111111111111111111111111111 
1111111111111111111.coml"


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7a629c3e-d072-4910-818c-93a8263a36ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Possible to prioritise some characters over others during OCR?

Reply via email to