I have a case where my tesseract isn't detecting URLs as expected. (More details in my SO question <http://stackoverflow.com/questions/37533524/tweak-tesseract-for-better-detection-of-urls-in-image>.)
The http:// part is being recognised as http:II. If I specify a white list of characters that doesn't include capital I tesseract recognizes the string correctly. Is it possible for me to specify a priority of characters to recognize? Any other ideas on how to tweak the parameters to increase my accuracy? <http://i.stack.imgur.com/jO1u9.png> Is incorrectly read as "http:II11111111111111111111111111111111111 1111111111111111111.coml" -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7a629c3e-d072-4910-818c-93a8263a36ea%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

