Re: How should this tricky spam be filtered?

RW Tue, 02 Feb 2010 06:37:17 -0800

On Mon, 01 Feb 2010 12:09:24 -0500
Adam Katz <antis...@khopis.com> wrote:

> Martin Gregorie wrote:

> > There was a recent suggestion that 'personal name' text from the
> > From: header should be included in the text examined by 'body'
> > rules, which already includes the Subject: text. This sounds like a
> > good thing to do.
> 
> My tests have been mildly successful on this note, with FROM_WWW
> already getting promoted out of testing:
> http://ruleqa.spamassassin.org/?rule=/FROM_W&srcpath=khop
> 
> This indicates that we don't actually need to parse any further
> because there is no sizable mass of legitimate mail that does this
> (and hopefully by getting this rule out the door, people considering
> it might decide against it).

When I suggested making changes it wasn't specifically to do with
urls -  clearly a url has no place in a from header, so it's a near sure
spam indicator.

The real issue is the textual content - particularly obfuscated words -
tests like SUBJECT_FUZZY_VPILL should also run against the from name
IMO. A good spam has to let the reader know roughly what it's selling
before it's deleted from the message list. A single word is enough to
achieve that.

The situation with Bayes is worse. AFAIK the subject is tokenized via
the body, so I was expecting "From" tokens might  be incompatible with
body/subject tokens; but when I tested this, I found that that the from
name is not tokenized at all [bug 6319].

Re: How should this tricky spam be filtered?

Reply via email to