On 03/09/2013 03:57 PM, Marc Andre Selig wrote:

I believe that, just as the label in the first example is too long, the
label in this example is simply too short (i.e. null).

In this case, the domain name has been split across three lines, probably
in an attempt to foil simple URIBL scanners.  This is the relevant part
of the original message body:

----- cut here -----
<a href="http://podify-merchants.
.
com/?dWlkPTI4OTA4NzEwMSZjaWQ9MjczODUmbGlkPTEmcm49Y2l0">
----- cut here -----

Whitespace within the URL is removed in line with RFC 1738/2396/3986,
and we end up with "http://podify-merchants..com/?...";, which is of
course invalid.

It seems to be an error on the part of the spammer, as this domain name
is written correctly (without the duplicate dot, but still split across
three lines) elsewhere in the same message.  Again, I think SpamAssassin
should be able to handle this without flagging an error message.

One good way to test these cases is open such a spam msg with Thunderbird and maybe some Outlook flavour.

If they can't handle the URL, then we can't really expect SA easily to do it either. You could use the ASK plugin intensely and walk the fine line of parsing errors and FPs (been there!). There are so many cases where CICO (crap-in-crap-out), it's hardly possible to foresee what template borks a spammer might deliver.

For stuff like that we need to rely on hashers, bayes and other content rules.



Reply via email to