On Fri, May 12, 2023 at 09:49:40AM -0500, Dave Funk wrote:
> On Fri, 12 May 2023, Matija Nalis wrote:
> > That is because those domains are not EQUAL? Od did you wanted a
> > rule that checks only on SIMILAR domain names (e.g. with lowercase
> > letter "L" replaced with number "1" as in your example)?
> 
> Now I get it, the OP is looking for some kind of comparison function that
> does an "apparent linguistic distance" evaluation of two strings and returns
> a score that indicates a "visual similarity" value.
> (EG replacing 'l' with '1' or 'O' with '0', etc).

It should be relatively easy to write SA plugin for that:

- replace those numeric and uppercase letters in one of the strings,
  convert both to lowercase, and compare them 

- it should also remove spacer characters (like "paypal" vs "pay-pal")

- It should also not only hit on exact matches, but return similarity
  in percentage (so trying to fake "spamassassin" with "spamasassin"
  can be detected).

Of course, non-ASCII would complicate those replacement tables
significantly (there are MANY more similar-looking glyphs then in
pure ASCII), but as I treat any IDN domains as suspicios, and they
are easy to detect, it would probably not be such a big deal.

> I've hand coded rules to check for this stuff when frequently abused but I
> don't know of a programmatic algorithm to do it automagically.

I wonder if someone has already done it, and something sufficiently
similar to be used to that purpose?

-- 
Opinions above are GNU-copylefted.

Reply via email to