On 02/28/2011 08:24 AM, Kris Deugau wrote: > Mail reported by a customer as falsely tagged showed these rule hits. > I've scored these rules down for now. > > Checking through the message text showed these likely matches: > > FRT_APPROV: approuvé > > FRT_EXPERIENCE: Expérience > > I'm pretty sure it's the accented 'e' in each word that's the trigger.
I agree. I have fixed those two specific examples on SA trunk at svn revision 1075489. Please note that this sort of thing is better handled as a bug request, and complaints directed at this list tend not to get such prompt attention. Try filing it in https://issues.apache.org/SpamAssassin/ next time. (Final note: it's better to note such a thing here than not at all.) > Given that, it's likely that similar rules will misfire on other > French words that are essentially spelled the same as in English, but > add a few accents on a vowel or two. This does indeed seem likely. Extra eyes from those of us versed in non-English Latin-character languages would be quite helpful. This could get you started: grep -riE '^(raw|body|header.*subject).*\(\?![a-z?]{2,}\)' rules* If you have GNU grep with libpcre, this is better (and colored): grep --color -riP '^\s*(?:raw|body|header.*subject\s).*\(\?!\K[\w?]{2,}(?=\))' rules* Use -h if you want to hide the file names.
signature.asc
Description: OpenPGP digital signature