On Thursday, July 25, 2013 03:23:39 AM Karsten Bräckelmann wrote:
> On Wed, 2013-07-24 at 20:28 -0400, Ian Turner wrote:
> > I notice that the old rule ADDRESS_IN_SUBJECT was dropped starting in
> > SpamAssassin 3.3 (The change is in bug 5123 and commit 467038). Lately,
> > however, I've started getting a lot of spam again where the To: address is
> > in the subject. Perhaps it's time to evaluate restoring this rule?
>
> Well, how do they score usually? It's hardly worth adding a point if
> they are rather high scoring anyway.
>
> header LOCALPART_IN_SUBJECT eval:check_for_to_in_subject('user')
>
> And all of them do hit that rule. A super-set of the ADDRESS variant,
> using the local part instead of the complete address. Still in stock
> rules.
They are moderately low-scoring, sadly (I wouldn't have noticed otherwise!),
mainly due to bayes poison. A typical message looks like this:
0.0 NO_DNS_FOR_FROM DNS: Envelope sender has no MX or A DNS records
1.9 DATE_IN_FUTURE_06_12 Date: is 6 to 12 hours after Received: date
-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
[score: 0.0000]
0.5 MISSING_MID Missing Message-Id: header
0.8 RDNS_NONE Delivered to internal network by a host with no
rDNS
0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid
Looking at the code for check_for_to_in_subject, it looks like the regular
expression used for LOCALPART_IN_SUBJECT is rather different (much more
specific) than the one used for ADDRESS_IN_SUBJECT. Presumably that's why this
rule doesn't match.
An example subject from this spam (address changed to protect the innocent):
<[email protected]>_Need Approval for Fast Funds? July 24th 2013_
For "address" mode, the regex is this one: /\b\Q$full_to\E\b/i
But for "user" mode, the regex is this one:
/^(?:
(?:re|fw):\s*(?:\w+\s+)?\Q$to\E$
|(?-i:\Q$to\E)\s*[,:;!?-](?:$|\s)
|\Q$to\E$
|,\s*\Q$to\E[,:;!?-]$
)/ix
Among other restrictions, this regex seems to only match the username at the
beginning or end of the subject.
--Ian