On 20-Nov-2006, at 05:52, twofers wrote:
header NOT_IN_ENGLISH Subject !~ /English/i
describe NOT_IN_ENGLISH Subject Contains Non English
Characters
score NOT_IN_ENGLISH 3.5
What regexp could I use?
Well, that's tricky. Sometimes the subject is encoded and sometimes
it's not. If you want to catch non-7 bit characters in the Subject,
that's pretty simple: [^ -~] (or anyway you specify that range, from
the range of ' ' (space) to '~' includes the normal 7 bit characters,
so you can test for that range, but of course would not include, for
example, £ or ¥, and it will do nothing if the subject is encoded.
Some possible characters you might want to filter on:
[¡¢£¤¥¦§¨©ª«¬ ®¯°±²³
´µ¶·¸¹º»¼½¾¿åÅäÄöÖàáâçèéêë]
[ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß]
[àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
=[89A-F][0-9A-F]
=(E5|C5|E4|C4|F6|D6|E0|E1|E2|E7|E8|E9|EA|EB)
However, just so you know, running a grep over my spamassassin-user
mail:
$ grep -e '^Subject:' spamassassin-users* | grep -e '[^ -~]'
Subject: 70_sare_header.cf dupe
Subject: Re: possible memory memory with SA 3.0.3 under Debian Linux
(me too)
Subject: 21:22:05为什么要做*逃*兵?
Subject: Re: SpamAssassin integrated with MailScanner, using per-
user configuration
Subject: Re: spamassassin less effective after upgrade to 3.1.0: some
checks no
Subject: Re: spamassassin less effective after upgrade to 3.1.0:
some checks no
Subject: ?ڭ̤w????a?????Ȩ?????ʦ??J ( mailman-owner )
Subject: [SPAM] orkut - Aninha.linda enviou um convite para voc?!
Subject: Pyzor Issues
Subject: Re: The best way to use Spamassassin is to not use
Spamassassin
Subject: Undeliverable:RE: Rule for mail contains bad email ids
Subject: Re: [EMAIL PROTECTED]: RE: SPAM: Increase in targeted
Subject: Re: Sa-learn --ham vs spamassassin -report
Subject: Re: rbl checks from 20_dnsbl_tests.cf won't work after
upgradingto 3.1.5
Subject: Re: rbl checks from 20_dnsbl_tests.cf won't work
after upgradingto 3.1.5
Subject: Re: Work has been closed permanently
Subject: Your online activity confirmation
Subject: Re: ??=??=?? ??=??=??=??=??==??=??=??~!
Subject: Re: ??=??=?? ??=??=??=??=??==??=??=??~!
I get a lot of things in there that don't appear to contain anything
other than a tab, so you might want to include that in your character
class as well (octal 11, 0x009)
--
"I don't think the kind of friends I'd have would care."