The same happens with other HTML tags...
<img  src=  can be replaced with <img xyz/src=
virtually any char but >

so, with Giovanni permission, i  tighten the nut 1 more turn   (limiting to 100 
chars to prevent Regex Self-DOS)
rawbody BADHREF /<(a|img|video)[^>]{0,100}\/(src|href)\=/


Pete.


    On Thursday, September 14, 2023 at 04:37:15 PM GMT+2, <giova...@paclan.it> 
wrote:  
 
 On 9/14/23 16:24, Bill Cole wrote:
> On 2023-09-14 at 04:37:03 UTC-0400 (Thu, 14 Sep 2023 17:37:03 +0900)
> Joe Wein via users <joew...@surbl.org>
> is rumored to have said:
> 
>> I filed a bug for this issue on Bugzilla (#8186) but so far no response from 
>> developers.
>> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8186
> 
> FWIW, I've thought about it a bit...
> 
>> We're seeing literally millions of phishing spams from Tencent VMs in 
>> Singapore targeting mostly Amazon Japan that are getting around SA checks 
>> because of this issue.
> 
> Wow. I didn't expect that this was that big of a tactic.
> 
>> I am wondering how many other users are seeing this problem which allows 
>> spammers to circumvent URI checks in links in spam (i.e. hide the payload 
>> sites).
> 
> I don't see it, but the systems I manage have no reason to expect anything 
> but criminal-grade spam from anything on a Tencent network in Singapore. 
> Everyone gets their own bespoke spamstream I guess.
> 
>> They do it by prefixing the href= attribute in an HTML <a href="..."> tag 
>> with letters and a slash, for example:
>>
>> <a h/href="https://some.phishing.site:>https://amazon.co.jp</a>
>>
>> Both Chrome and mail clients like Mozilla Thunderbird discard that "h/" 
>> prefix (perhaps treating it as a separate unrecognizable attribute, like "<a 
>> h href="...") and display a clickable link to the payload site while 
>> SpamAssassin will not see the URI and therefore not it through any of the 
>> rules for URIs.
>>
>> This means even if the bad site is listed on domain RBLs (SURBL, Spamhaus or 
>> URIBL), the mail is not tagged for that.
>>
>> Joe Wein
>> SURBL
> 
> I'm thinking that the best approach may not be in trying to parse the bogus 
> tag to glean a domain that may or may not be known to be bad, but rather to 
> detect the general pattern, which is itself a direct indicator of bad intent.
> 
rawbody BADHREF /\s+.\/href\=/

should be a start to write a rule to catch those spam messages.
  Giovanni

  

Reply via email to