Re: Rule for Russian character sets
Hmm, let me see. I use the below in user_prefs. Hope that helps. header J_CHSET3 Subject:raw =~ /\s=\?(windows-(125[0125]|874)|koi8-r|iso-8859-[28])\?/i score J_CHSET3 5 ifplugin Mail::SpamAssassin::Plugin::TextCat #ok_languages en zh.big5 #http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5697 ok_languages en zh add_header all Languages _LANGUAGES_ score UNWANTED_LANGUAGE_BODY 5 endif ok_locales en zh
Re: FW: Rule for Russian character sets (=?koi8-r? not quite acharset)
On Mon, 2008-02-18 at 09:36 +1300, Michael Hutchinson wrote: We don't want to only allow the English locale, because we (here at my work) do not want our international clients (non Russian) to be denied email service. ok_locales en ja ko th zh This will allow anything but Cyrillic char sets. Please note that en does *not* mean English locale despite its name. It applies to all Western charsets, including German Umlauts, Swedisch, French, Turkish, etc. Basically everything that uses the characters in this post, plus language specific chars. Ok now we're talking turkey. Thanks for providing the much needed clarity on ok_locales. I may just employ that technique yet, pending whether we get any more Russian spam through the gates. Sorry, I did not mean to troll nor any kind of offense. You have my apologies, as being a Friday afternoon, I was pretty sick of work and shouldn't have taken it out on you or the list. Sorry. Hope this clarifies my previous posts and is appreciated again... Your posts are appreciated, and sorry for the mean comment. Thanks. No offense taken, no harm done, don't worry. :) guenther -- char *t=[EMAIL PROTECTED]; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
RE: Rule for Russian character sets
-Original Message- For the most part you can match any character by the appearance of the character. Any character with special meaning needs to be escaped in some way. The easiest way is usually with a backslash, but in some cases you can also do it by making it a member of a character class. So for you questionmark case, you could do \? or [?], as most of the special characters lose their meaning in a character class. The exceptions are obviously right bracket, backslash, and dash becomes special if it isn't the first character. /\=\?koi8\-r\?/ This is what I'd setup originally, except when I ran it past a RE interpreter the results were just.. wrong. I do think it would work, however, and will be testing it on a Virtual Machine today to be sure. This should work. You don't need to escape the dash, and I'm pretty sure you don't need to escape the equal sign; just the questionmark. Also, you may want to handle this in both uppercase and lowercase, so you could do /=\?koi8-r\?/i And you probably don't need the = sign to get reasonably reliable matching. Ah, this is the bit I was unsure about, limiting how many characters are escaped. I would tend towards the fully escaped one myself, I just wouldn't trust non-escaped = and ? signs. But that's probably got to do with some bad history with Spamassassin:) Thanks for reinforcing some points with RE that needed to be (: Cheers, Mike
FW: Rule for Russian character sets (=?koi8-r? not quite acharset)
-Original Message-snipsnip We don't want to only allow the English locale, because we (here at my work) do not want our international clients (non Russian) to be denied email service. ok_locales en ja ko th zh This will allow anything but Cyrillic char sets. Please note that en does *not* mean English locale despite its name. It applies to all Western charsets, including German Umlauts, Swedisch, French, Turkish, etc. Basically everything that uses the characters in this post, plus language specific chars. Ok now we're talking turkey. Thanks for providing the much needed clarity on ok_locales. I may just employ that technique yet, pending whether we get any more Russian spam through the gates. Sorry, I did not mean to troll nor any kind of offense. You have my apologies, as being a Friday afternoon, I was pretty sick of work and shouldn't have taken it out on you or the list. Sorry. However, you missed my point. Getting detailed with REs is a good thing, sure. I was not about that -- but the RE in question does not properly handle charset encoding. See the Subject for an example which is not encoding, but will be matched by your rule. My point was, that the rule discussed aims at being something that it unfortunately is not, because charset encoding is slightly more complex and definitely requires a closing part. A Regular Expression that does this can be found in check_for_faraway_charset_in_headers() in HeaderEval.pm: $hdr =~ /=\?(.+?)\?.\?.*?\?=/g Hence, the my re-inventing the wheel analogy. And these wheels are quite flexible, too. ;-) Also, your rule applies to the Subject only, whereas ok_locales does check all MIME parts and will trigger on Russian spam with a western Subject. The RE in question (my one) was not just written for subject, but a separate rule was written for the raw From: line as well. As we only score spam here and leave filing it to the MUA (unless a score of 25 is reached, where SA bins it), scoring against the Subject and From lines makes OK sense, because if you used simply (=?koi8-r?) in the subject it would not score high enough on it's own to be filtered or blocked. (I'm trying to employ what I've learned from the SA webpage about writing multiple low-scoring rules, instead of a few big-scoring ones). I can see it is flawed, but have to also admit that it is working rather well at the moment. Mind you, I have taken the time to translate some of the Russian Spam, work out spammy phrases, and then quote those phrases to be scored against by SA. Hope this clarifies my previous posts and is appreciated again... Your posts are appreciated, and sorry for the mean comment. Cheers, Mike
RE: Rule for Russian character sets (=?koi8-r? not quite a charset)
On Fri, 2008-02-15 at 17:10 +1300, Michael Hutchinson wrote: From: Karsten Bräckelmann [mailto:[EMAIL PROTECTED] Why are you guys now trying to re-invent the wheel in the special case of a gray asphalt street? What about a dirt track, grass, and anything else a wheel works on? I've pointed it out before. Just use ok_locales, which is all about these char sets. No REs, almost no thinking required, no headache. A single line, and you're done. We don't want to only allow the English locale, because we (here at my work) do not want our international clients (non Russian) to be denied email service. ok_locales en ja ko th zh This will allow anything but Cyrillic char sets. Please note that en does *not* mean English locale despite its name. It applies to all Western charsets, including German Umlauts, Swedisch, French, Turkish, etc. Basically everything that uses the characters in this post, plus language specific chars. That aside, I really don't think getting detailed with Regular Expressions is re-inventing the wheel. Rather, it is expanding knowledge that will help write better rules in the future. (More flexible wheels, in your context). Although I appreciated your earlier post of 'ok_locales', and understood it, I did not appreciate your Troll. Sorry, I did not mean to troll nor any kind of offense. However, you missed my point. Getting detailed with REs is a good thing, sure. I was not about that -- but the RE in question does not properly handle charset encoding. See the Subject for an example which is not encoding, but will be matched by your rule. My point was, that the rule discussed aims at being something that it unfortunately is not, because charset encoding is slightly more complex and definitely requires a closing part. A Regular Expression that does this can be found in check_for_faraway_charset_in_headers() in HeaderEval.pm: $hdr =~ /=\?(.+?)\?.\?.*?\?=/g Hence, the my re-inventing the wheel analogy. And these wheels are quite flexible, too. ;-) Also, your rule applies to the Subject only, whereas ok_locales does check all MIME parts and will trigger on Russian spam with a western Subject. Hope this clarifies my previous posts and is appreciated again... guenther -- char *t=[EMAIL PROTECTED]; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Rule for Russian character sets
I believe that what you are asking for is meta RUSSIAN_AND_BADTEXT (CHARSET_FARAWAY __OTHER_RULE) That requires first that you have set up ok_locales. --Paul Rosenbaum, Larry M. wrote: From: Karsten Bräckelmann [mailto:[EMAIL PROTECTED] I've pointed it out before. Just use ok_locales, which is all about these char sets. No REs, almost no thinking required, no headache. A single line, and you're done. What's the best way to test the character set for use in a meta rule? We don't want to reject all messages with the Russian (Cyrillic) character set, but we may want to use something like if (character set is Russian) (body contains 'xyzzy') for instance. How would we test the character set? -- Paul Douglas Franklin Computer Manager, Union Gospel Mission of Yakima, Washington Husband of Danette Father of Laurene, Miriam, Tycko, Timothy, Sarabeth, Marie, Dawnita, Anna Leah, Alexander, and Caleb
Re: Rule for Russian character sets
KB If you want to trigger on Russian only, list all but ru. What if to catch Ms. Ba'loney Margar'ine, airport security had to keep a current list of all the other people in the world. So this is the wrong approach, as we've been thru before. OK, bye.
Re: Rule for Russian character sets
On Fri, 2008-02-15 at 11:04 -0800, Paul Douglas Franklin wrote: I believe that what you are asking for is meta RUSSIAN_AND_BADTEXT (CHARSET_FARAWAY __OTHER_RULE) That requires first that you have set up ok_locales. If you have TextCat enabled, then the X-Language: meta header will be added and can be used with rules, although it doesn't show up in the output. I don't think that there is an equivalent X-Locales: --Paul Rosenbaum, Larry M. wrote: From: Karsten Bräckelmann [mailto:[EMAIL PROTECTED] I've pointed it out before. Just use ok_locales, which is all about these char sets. No REs, almost no thinking required, no headache. A single line, and you're done. What's the best way to test the character set for use in a meta rule? We don't want to reject all messages with the Russian (Cyrillic) character set, but we may want to use something like if (character set is Russian) (body contains 'xyzzy') for instance. How would we test the character set? -- Daniel J McDonald, CCIE #2495, CISSP #78281, CNX Austin Energy http://www.austinenergy.com signature.asc Description: This is a digitally signed message part
RE: Rule for Russian character sets
From: Karsten Bräckelmann [mailto:[EMAIL PROTECTED] I've pointed it out before. Just use ok_locales, which is all about these char sets. No REs, almost no thinking required, no headache. A single line, and you're done. What's the best way to test the character set for use in a meta rule? We don't want to reject all messages with the Russian (Cyrillic) character set, but we may want to use something like if (character set is Russian) (body contains 'xyzzy') for instance. How would we test the character set?
RE: Rule for Russian character sets
On Fri, 2008-02-15 at 11:49 -0500, Rosenbaum, Larry M. wrote: From: Karsten Bräckelmann [mailto:[EMAIL PROTECTED] I've pointed it out before. Just use ok_locales, which is all about these char sets. No REs, almost no thinking required, no headache. A single line, and you're done. What's the best way to test the character set for use in a meta rule? We don't want to reject SA doesn't reject anyway. It merely classifies and tags mail. all messages with the Russian (Cyrillic) character set, but we may want to use something like if (character set is Russian) (body contains 'xyzzy') Well, it depends... If it is ok for you to treat all char sets, which you did not set in ok_locales, the same way, then it is just a regular meta rule -- and based on my understanding of your description re-scoring of the few CHARSET_FARAWY rules. for instance. How would we test the character set? This I believe can not be done with the current HeaderEval plugin, since it does not report the char set, but treats all unwanted char sets the same. However, if you need fine grained rules per char set, it should be fairly easy to alter the existing plugin or to write custom rules or plugin based on this. guenther -- char *t=[EMAIL PROTECTED]; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Rule for Russian character sets
On Sat, 2008-02-16 at 04:26 +0800, [EMAIL PROTECTED] wrote: KB If you want to trigger on Russian only, list all but ru. What if to catch Ms. Ba'loney Margar'ine, airport security had to keep a current list of all the other people in the world. So this is the wrong approach, as we've been thru before. OK, bye. Thank you for your most valuable contribution. Yes, we've been through this before. However, it seems you still don't understand. There IS NO negated counterpart to ok_locales. Also, this is not about languages, but character sets -- and there are exactly 6. So, listing all but one in this context doesn't seem to be asking too much. Instead of ranting, just try to understand ok_locales as an option to list all character sets you can read. For most people, this boils down to one or two anyway. Thus, the general usecase is to list just these. Also, the OP specifically asked to catch Russian only. Listing 5 locales is the only way to do this currently. If you know about a better way, please let me know. Otherwise, you just wasted everyone's time. Had a bad day, eh? guenther -- char *t=[EMAIL PROTECTED]; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Rule for Russian character sets
We're suddenly getting a ton of spam with koi8-r encoding...I tried to do a custom rule for it like this: header SUBJ_RUSS_CHAR Subject =~/koi8-r/i describe SUBJ_RUSS_CHAR has Russian char encoding score SUBJ_RUSS_CHAR3.5 The short headers for these spams look like this: Subject: [koi8-r] ??? The raw Subject header, like this: Subject: =?koi8-r?B?9/zkINDSxcTQ0snR1MnKINPFzcnOwdI=?= I would think the rule would catch it either way...what am I missing? TIA, James Smallacombe PlantageNet, Inc. CEO and Janitor [EMAIL PROTECTED] http://3.am =
Re: Rule for Russian character sets
[EMAIL PROTECTED] wrote: We're suddenly getting a ton of spam with koi8-r encoding...I tried to do a custom rule for it like this: header SUBJ_RUSS_CHAR Subject =~/koi8-r/i describe SUBJ_RUSS_CHAR has Russian char encoding score SUBJ_RUSS_CHAR3.5 The short headers for these spams look like this: Subject: [koi8-r] ??? The raw Subject header, like this: Subject: =?koi8-r?B?9/zkINDSxcTQ0snR1MnKINPFzcnOwdI=?= I would think the rule would catch it either way...what am I missing? I think this should work: header SUBJ_RUSS_CHAR Subject:raw =~ /koi8-r/i /Per Jessen, Zürich
Re: Rule for Russian character sets
On Thu, 2008-02-14 at 10:17 -0500, [EMAIL PROTECTED] wrote: We're suddenly getting a ton of spam with koi8-r encoding...I tried to do a custom rule for it like this: header SUBJ_RUSS_CHAR Subject =~/koi8-r/i describe SUBJ_RUSS_CHAR has Russian char encoding score SUBJ_RUSS_CHAR3.5 I would think the rule would catch it either way...what am I missing? I guess its being decoded before matching. It's not the actual subject anyway, but a charset definition. Instead of writing your own rules to catch these, I suggest using ok_locales. See the Language Options: http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html If you want to trigger on Russian only, list all but ru. However, you probably want more like en (all western charsets) only. ;) Also, this will trigger on header as well as on the body. grep for CHARSET_FARAWAY in the rules, if you want to adjust its scores. guenther -- char *t=[EMAIL PROTECTED]; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Rule for Russian character sets
On Thu, 14 Feb 2008, Per Jessen wrote: [EMAIL PROTECTED] wrote: We're suddenly getting a ton of spam with koi8-r encoding...I tried to do a custom rule for it like this: header SUBJ_RUSS_CHAR Subject =~/koi8-r/i describe SUBJ_RUSS_CHAR has Russian char encoding score SUBJ_RUSS_CHAR3.5 The short headers for these spams look like this: Subject: [koi8-r] ??? The raw Subject header, like this: Subject: =?koi8-r?B?9/zkINDSxcTQ0snR1MnKINPFzcnOwdI=?= I would think the rule would catch it either way...what am I missing? I think this should work: header SUBJ_RUSS_CHAR Subject:raw =~ /koi8-r/i That did it, thanks! James Smallacombe PlantageNet, Inc. CEO and Janitor [EMAIL PROTECTED] http://3.am =
RE: Rule for Russian character sets
-Original Message- We're suddenly getting a ton of spam with koi8-r encoding...I tried to do a custom rule for it like this: header SUBJ_RUSS_CHAR Subject =~/koi8-r/i describe SUBJ_RUSS_CHAR has Russian char encoding score SUBJ_RUSS_CHAR3.5 The short headers for these spams look like this: Subject: [koi8-r] ??? The raw Subject header, like this: Subject: =?koi8-r?B?9/zkINDSxcTQ0snR1MnKINPFzcnOwdI=?= I would think the rule would catch it either way...what am I missing? I think this should work: header SUBJ_RUSS_CHAR Subject:raw =~ /koi8-r/i That did it, thanks! Are we not meant to delimit characters like a minus sign? Ex: header SUBJ_RUSS_CHAR Subject:raw =~ /koi8\-r/i I would really like to trap the question marks too, just in case someone sends a legitimate email with koi8-r in the subject (ie: why does email with the koi8-r character set get tagged as spam?) In other words, the following rule (if it worked) would be nice to use instead: Ex: Header SUBJ_RUSS_CHAR Subject:raw =~ /\=\?koi8\-r\?/ Where we could trap the Equals sign, and two question marks. I have not employed this rule because I think its dodgy, the Regexp expander over at SARE says there is a scary amount of matches (2000+) with that rule, so I'm presuming that the matching for the equals character and the question mark are not working properly, and will have to be delimited some other way. For example, using the \x1B notation, but I've had no luck with this. Does anyone have suggestions for matching question marks and equals signs in one line? I would like to match everything exactly between the double quotes: =?koi8-r? If I were to read the perldoc docs I'd be using \=\?koi8\-r\? But I don't want to test it on my live server, because of the output of the Regex expander utility. Help anyone? Cheers, Mike
RE: Rule for Russian character sets
On Fri, 15 Feb 2008, Michael Hutchinson wrote: Are we not meant to delimit characters like a minus sign? Ex: header SUBJ_RUSS_CHAR Subject:raw =~ /koi8\-r/i Only where they have special meaning, and a dash is only special in a character set, e.g. [A-Z]. I have found the simplest way to avoid misinterpretation in that context is to put the dash first, e.g. [-abcde12345] -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- ...to announce there must be no criticism of the President or to stand by the President right or wrong is not only unpatriotic and servile, but is morally treasonous to the American public. -- Theodore Roosevelt, 1918 --- 8 days until George Washington's 276th Birthday
RE: Rule for Russian character sets
-Original Message- From: John Hardin [mailto:[EMAIL PROTECTED] Sent: Friday, 15 February 2008 2:19 p.m. To: Michael Hutchinson Cc: users@spamassassin.apache.org Subject: RE: Rule for Russian character sets On Fri, 15 Feb 2008, Michael Hutchinson wrote: Are we not meant to delimit characters like a minus sign? Ex: header SUBJ_RUSS_CHAR Subject:raw =~ /koi8\-r/i Only where they have special meaning, and a dash is only special in a character set, e.g. [A-Z]. I have found the simplest way to avoid misinterpretation in that context is to put the dash first, e.g. [-abcde12345] Ok fair enough. I've noticed that having the \ doesn't hurt for a dash. Now what about matching a question mark and an equals sign? I'm tempted to setup Spamassassin under a virtual machine, just so I can test against \= and \? I've read perlre and perlretut and understand regular expressions, but there is no clear cut way of matching these characters, either outlined by this document or any Spamassassin document I've come across so far. Except for a backslash, but I've heard no testimony would suggest this line will work with Spamassassin, and like before, the SARE Regular Expressions Expander tool doesn't like it (and may have put un-due doubt in my head): /\=\?koi8\-r\?/ I tried using \x1B notation, and it doesn't work, so presumably, not every feature of perl regular expressions work under Spamassassin. Cheers, Mike
FW: Rule for Russian character sets
-Original Message- From: John Hardin [mailto:[EMAIL PROTECTED] Sent: Friday, 15 February 2008 3:07 p.m. To: Michael Hutchinson Subject: RE: Rule for Russian character sets On Fri, 15 Feb 2008, Michael Hutchinson wrote: Now what about matching a question mark and an equals sign? An equals sign isn't special but a question mark is. Except for a backslash, but I've heard no testimony would suggest this line will work with Spamassassin, and like before, the SARE Regular Expressions Expander tool doesn't like it (and may have put un-due doubt in my head): /\=\?koi8\-r\?/ Try/=\?koi8-r\?/i NB: You can also use [?] (a character set consisting of a single question mark) but that's a little clumsy. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- It may be possible to start a programme of weapon registration as a first step towards the physical collection phase. ... Assurances must be provided, and met, that the process of registration will not lead to immediate weapons seizures by security forces. -- the UN, who doesn't want to confiscate guns --- 8 days until George Washington's 276th Birthday
RE: Rule for Russian character sets
-Original Message- From: John Hardin [mailto:[EMAIL PROTECTED] Sent: Friday, 15 February 2008 3:07 p.m. To: Michael Hutchinson Subject: RE: Rule for Russian character sets On Fri, 15 Feb 2008, Michael Hutchinson wrote: Now what about matching a question mark and an equals sign? An equals sign isn't special but a question mark is. Except for a backslash, but I've heard no testimony would suggest this line will work with Spamassassin, and like before, the SARE Regular Expressions Expander tool doesn't like it (and may have put un-due doubt in my head): /\=\?koi8\-r\?/ Try/=\?koi8-r\?/i NB: You can also use [?] (a character set consisting of a single question mark) but that's a little clumsy. OK sounds good, might just have to test that one under Vmware as well. Results from SARE Regexp expander weren't good, I don't know if I should trust that thing anymore. Thanks, Mike
RE: Rule for Russian character sets
On Fri, 2008-02-15 at 12:19 +1300, Michael Hutchinson wrote: [...] Does anyone have suggestions for matching question marks and equals signs in one line? I would like to match everything exactly between the double quotes: Apart from neither equal nor minus being any special in an RE (outside a char class) unlike the question mark, which has been answered already... Why are you guys now trying to re-invent the wheel in the special case of a gray asphalt street? What about a dirt track, grass, and anything else a wheel works on? I've pointed it out before. Just use ok_locales, which is all about these char sets. No REs, almost no thinking required, no headache. A single line, and you're done. guenther -- char *t=[EMAIL PROTECTED]; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
RE: Rule for Russian character sets
-Original Message- From: Karsten Bräckelmann [mailto:[EMAIL PROTECTED] Sent: Friday, 15 February 2008 3:43 p.m. To: users@spamassassin.apache.org Subject: RE: Rule for Russian character sets On Fri, 2008-02-15 at 12:19 +1300, Michael Hutchinson wrote: [...] Does anyone have suggestions for matching question marks and equals signs in one line? I would like to match everything exactly between the double quotes: Apart from neither equal nor minus being any special in an RE (outside a char class) unlike the question mark, which has been answered already... Why are you guys now trying to re-invent the wheel in the special case of a gray asphalt street? What about a dirt track, grass, and anything else a wheel works on? I've pointed it out before. Just use ok_locales, which is all about these char sets. No REs, almost no thinking required, no headache. A single line, and you're done. guenther We don't want to only allow the English locale, because we (here at my work) do not want our international clients (non Russian) to be denied email service. That aside, I really don't think getting detailed with Regular Expressions is re-inventing the wheel. Rather, it is expanding knowledge that will help write better rules in the future. (More flexible wheels, in your context). Although I appreciated your earlier post of 'ok_locales', and understood it, I did not appreciate your Troll. Cheers, Mike
Re: Rule for Russian character sets
Ok fair enough. I've noticed that having the \ doesn't hurt for a dash. Now what about matching a question mark and an equals sign? If you read perlre closely you will find it says that it never hurts to put a backslash before a special character that you want to match as a character. So this is a case of does nothing, but doesn't hurt. I've read perlre and perlretut and understand regular expressions, but there is no clear cut way of matching these characters, either outlined by this document or any Spamassassin document I've come across so far. For the most part you can match any character by the appearance of the character. Any character with special meaning needs to be escaped in some way. The easiest way is usually with a backslash, but in some cases you can also do it by making it a member of a character class. So for you questionmark case, you could do \? or [?], as most of the special characters lose their meaning in a character class. The exceptions are obviously right bracket, backslash, and dash becomes special if it isn't the first character. /\=\?koi8\-r\?/ This should work. You don't need to escape the dash, and I'm pretty sure you don't need to escape the equal sign; just the questionmark. Also, you may want to handle this in both uppercase and lowercase, so you could do /=\?koi8-r\?/i And you probably don't need the = sign to get reasonably reliable matching. Loren