On 20-Nov-2006, at 05:52, twofers wrote:
  header       NOT_IN_ENGLISH     Subject !~ /English/i
describe NOT_IN_ENGLISH Subject Contains Non English Characters
  score         NOT_IN_ENGLISH     3.5

  What regexp could I use?

Well, that's tricky. Sometimes the subject is encoded and sometimes it's not. If you want to catch non-7 bit characters in the Subject, that's pretty simple: [^ -~] (or anyway you specify that range, from

the range of ' ' (space) to '~' includes the normal 7 bit characters, so you can test for that range, but of course would not include, for example, £ or ¥, and it will do nothing if the subject is encoded.

Some possible characters you might want to filter on:

[¡¢£¤¥¦§¨©ª«¬ ®¯°±²³ ´µ¶·¸¹º»¼½¾¿åÅäÄöÖàáâçèéêë]
[ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß]
[àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
=[89A-F][0-9A-F]
=(E5|C5|E4|C4|F6|D6|E0|E1|E2|E7|E8|E9|EA|EB)

However, just so you know, running a grep over my spamassassin-user mail:

$ grep -e '^Subject:' spamassassin-users* | grep -e '[^ -~]'
Subject:        70_sare_header.cf dupe
Subject: Re: possible memory memory with SA 3.0.3 under Debian Linux (me too)
Subject: 21:22:05为什么要做*逃*兵?
Subject: Re: SpamAssassin integrated with MailScanner, using per- user configuration Subject: Re: spamassassin less effective after upgrade to 3.1.0: some checks no Subject: Re: spamassassin less effective after upgrade to 3.1.0: some checks no
Subject: ?ڭ̤w????a?????Ȩ?????ʦ??J      ( mailman-owner  )
Subject: [SPAM] orkut -  Aninha.linda enviou um convite para voc?!
Subject:        Pyzor Issues
Subject: Re: The best way to use Spamassassin is to not use Spamassassin
Subject:        Undeliverable:RE: Rule for mail contains bad email ids
Subject:        Re: [EMAIL PROTECTED]: RE: SPAM: Increase in targeted
Subject:        Re: Sa-learn --ham vs spamassassin -report
Subject: Re: rbl checks from 20_dnsbl_tests.cf won't work after upgradingto 3.1.5 Subject: Re: rbl checks from 20_dnsbl_tests.cf won't work after upgradingto 3.1.5
Subject:        Re: Work has been closed permanently
Subject:        Your online activity confirmation
Subject: Re: ??=??=?? ??=??=??=??=??==??=??=??~!
Subject: Re: ??=??=?? ??=??=??=??=??==??=??=??~!

I get a lot of things in there that don't appear to contain anything other than a tab, so you might want to include that in your character class as well (octal 11, 0x009)

--
"I don't think the kind of friends I'd have would care."


Reply via email to