Tesseract is mostly used to recognize text from images. >From what I understand you want to protect yourself from phishing. A very good way to do that is to familiarize yourself with Levenshtein distance algorithm. It's very simple - it calculates how many changes you need to make to a string to get to the desired string. For example if you have paiipal and compare it to paypal it will give you a distance of 3 - remove 2 letters and add 1.
Why am I suggesting this - because your problem has already been solved in a slightly different situation - corporate world. Sometimes a bad employee in a company would try to switch the company name on a document with the same name but 2 letters are swapped for example, small alterations like this are hard to notice for a human, like you pointed out, but for a machine is very easy. I hope this helps, if not, maybe I did not fully understand your intentions and you would have to clarify why you need to use Tesseract so I can further help you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/861e711a-d5e7-4299-a954-bb438d9706b6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

