Re: Mal formed urls

Bill Cole Thu, 25 Feb 2021 13:31:38 -0800

On 25 Feb 2021, at 13:37, Rick Cooper wrote:

I was just working on some rules to catch the current crop of malformedurls used to escape detection by solutions that extract urls fromemails andcompare them to known bad urls and I am wondering if spamassassin'spatterns
for extraction take this into account?
For instance:

https:www.google.com/mail
https:\/www.google.com/mail
https:\\www.google.com/mail
Will all work at getting you to gmail because the technical specdoesn't
actually require \\ after the colon.

Of course not: A http: URI must NOT contain '\\' after the colon, itMUST contain '//' after the colon. Seehttps://tools.ietf.org/html/rfc7230#section-2.7.1 which is the technicalspec for the formal syntax of a http URI. OTOH, there are URI schemeswhich do not include '//' (e.g. mailto:) so any tool that is doing broadURI detection can't be too picky.

What flavors of garbage almost-URIs will work in a browser very muchdepends on the whims of browser developers, and whether those are'clickable' in your preferred MUA is dependent on the gullibility ofyour MUA author.

SpamAssassin traditionally has assumed that there will always be someMUA and browser authors who lack any sense of caution or prudence, so SAis VERY loose with what it will consider as maybe being a hostname insomething that could be a URI in some obscure or novel scheme.

Will spamassassin still extract and normalize the urls above?


Yes, it will see all 3 as the same canonicalized URI.

I was hoping
to avoid digging through the source to find out.

No need to dig though the source, you can see what URIs SpamAssassindetects (trimmed of the parts after the hostname) in a message bymanually testing it with 'spamassassin -D uri' Note that SA will onlyshow one instance of otherwise identical URIs after trimming andcanonicalization.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Re: Mal formed urls

Reply via email to