http://bugzilla.spamassassin.org/show_bug.cgi?id=3013
------- Additional Comments From [EMAIL PROTECTED] 2004-02-06 11:39 ------- Removing the . and _ sounds like a decent solution to me (though my experience with spamassassin is limited to the past week). With web addresses so common, . has become a common delimiter to seperate words. For instance, Java uses the .tld.domain.project.subproject naming scheme for classes. It's no mistake that many of the X-Mailer: headers use internet domains as their identifiers. I think you're right that there's no simple way of distiguishing 'ckGmqXGFWNfaNAxRse' from ClassifiedVentures using regular expressions. Assuming what you're really looking for is either randomly generated X-Mailer strings (or some ratware guy just hitting keys on his keyboard), you might just look at the "information content" of the string. 'ckGmqXGFWNfaNAxRse' is a random string of upper/lowercase text. Where 'ClassifiedVentures' is not random at all. The random string contains more "information", where the non-random one contains less. A simple test might be trying to compress the string. If it's very compressible it has low information content, and wasn't generated randomly. If it's not very compressible it has high information content, and is probbably randomly generated. Slightly off topic, but could this kind of test could be applied to other parts of a message too? I've noticed a lot of spam having random strings inserted in them in an attempt to get past filters. If you could identify these strings as random, you could add to a mails spam rating. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
