On Mon, 01 Feb 2010 12:09:24 -0500 Adam Katz <antis...@khopis.com> wrote:
> Martin Gregorie wrote: > > There was a recent suggestion that 'personal name' text from the > > From: header should be included in the text examined by 'body' > > rules, which already includes the Subject: text. This sounds like a > > good thing to do. > > My tests have been mildly successful on this note, with FROM_WWW > already getting promoted out of testing: > http://ruleqa.spamassassin.org/?rule=/FROM_W&srcpath=khop > > This indicates that we don't actually need to parse any further > because there is no sizable mass of legitimate mail that does this > (and hopefully by getting this rule out the door, people considering > it might decide against it). When I suggested making changes it wasn't specifically to do with urls - clearly a url has no place in a from header, so it's a near sure spam indicator. The real issue is the textual content - particularly obfuscated words - tests like SUBJECT_FUZZY_VPILL should also run against the from name IMO. A good spam has to let the reader know roughly what it's selling before it's deleted from the message list. A single word is enough to achieve that. The situation with Bayes is worse. AFAIK the subject is tokenized via the body, so I was expecting "From" tokens might be incompatible with body/subject tokens; but when I tested this, I found that that the from name is not tokenized at all [bug 6319].