On 03/09/2013 03:57 PM, Marc Andre Selig wrote:
I believe that, just as the label in the first example is too long, the label in this example is simply too short (i.e. null). In this case, the domain name has been split across three lines, probably in an attempt to foil simple URIBL scanners. This is the relevant part of the original message body: ----- cut here ----- <a href="http://podify-merchants. . com/?dWlkPTI4OTA4NzEwMSZjaWQ9MjczODUmbGlkPTEmcm49Y2l0"> ----- cut here ----- Whitespace within the URL is removed in line with RFC 1738/2396/3986, and we end up with "http://podify-merchants..com/?...", which is of course invalid. It seems to be an error on the part of the spammer, as this domain name is written correctly (without the duplicate dot, but still split across three lines) elsewhere in the same message. Again, I think SpamAssassin should be able to handle this without flagging an error message.
One good way to test these cases is open such a spam msg with Thunderbird and maybe some Outlook flavour.
If they can't handle the URL, then we can't really expect SA easily to do it either. You could use the ASK plugin intensely and walk the fine line of parsing errors and FPs (been there!). There are so many cases where CICO (crap-in-crap-out), it's hardly possible to foresee what template borks a spammer might deliver.
For stuff like that we need to rely on hashers, bayes and other content rules.
