The same happens with other HTML tags... <img src= can be replaced with <img xyz/src= virtually any char but >
so, with Giovanni permission, i tighten the nut 1 more turn (limiting to 100 chars to prevent Regex Self-DOS) rawbody BADHREF /<(a|img|video)[^>]{0,100}\/(src|href)\=/ Pete. On Thursday, September 14, 2023 at 04:37:15 PM GMT+2, <giova...@paclan.it> wrote: On 9/14/23 16:24, Bill Cole wrote: > On 2023-09-14 at 04:37:03 UTC-0400 (Thu, 14 Sep 2023 17:37:03 +0900) > Joe Wein via users <joew...@surbl.org> > is rumored to have said: > >> I filed a bug for this issue on Bugzilla (#8186) but so far no response from >> developers. >> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8186 > > FWIW, I've thought about it a bit... > >> We're seeing literally millions of phishing spams from Tencent VMs in >> Singapore targeting mostly Amazon Japan that are getting around SA checks >> because of this issue. > > Wow. I didn't expect that this was that big of a tactic. > >> I am wondering how many other users are seeing this problem which allows >> spammers to circumvent URI checks in links in spam (i.e. hide the payload >> sites). > > I don't see it, but the systems I manage have no reason to expect anything > but criminal-grade spam from anything on a Tencent network in Singapore. > Everyone gets their own bespoke spamstream I guess. > >> They do it by prefixing the href= attribute in an HTML <a href="..."> tag >> with letters and a slash, for example: >> >> <a h/href="https://some.phishing.site:>https://amazon.co.jp</a> >> >> Both Chrome and mail clients like Mozilla Thunderbird discard that "h/" >> prefix (perhaps treating it as a separate unrecognizable attribute, like "<a >> h href="...") and display a clickable link to the payload site while >> SpamAssassin will not see the URI and therefore not it through any of the >> rules for URIs. >> >> This means even if the bad site is listed on domain RBLs (SURBL, Spamhaus or >> URIBL), the mail is not tagged for that. >> >> Joe Wein >> SURBL > > I'm thinking that the best approach may not be in trying to parse the bogus > tag to glean a domain that may or may not be known to be bad, but rather to > detect the general pattern, which is itself a direct indicator of bad intent. > rawbody BADHREF /\s+.\/href\=/ should be a start to write a rule to catch those spam messages. Giovanni