On 02/28/2011 08:24 AM, Kris Deugau wrote:
> Mail reported by a customer as falsely tagged showed these rule hits.
> I've scored these rules down for now.
> 
> Checking through the message text showed these likely matches:
> 
> FRT_APPROV:    approuvé
> 
> FRT_EXPERIENCE:    Expérience
> 
> I'm pretty sure it's the accented 'e' in each word that's the trigger.

I agree.  I have fixed those two specific examples on SA trunk at svn
revision 1075489.

Please note that this sort of thing is better handled as a bug request,
and complaints directed at this list tend not to get such prompt
attention.  Try filing it in https://issues.apache.org/SpamAssassin/
next time.  (Final note:  it's better to note such a thing here than not
at all.)


> Given that, it's likely that similar rules will misfire on other
> French words that are essentially spelled the same as in English, but
> add a few accents on a vowel or two.

This does indeed seem likely.  Extra eyes from those of us versed in
non-English Latin-character languages would be quite helpful.

This could get you started:

grep -riE '^(raw|body|header.*subject).*\(\?![a-z?]{2,}\)' rules*

If you have GNU grep with libpcre, this is better (and colored):

grep --color -riP
  '^\s*(?:raw|body|header.*subject\s).*\(\?!\K[\w?]{2,}(?=\))' rules*

Use -h if you want to hide the file names.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to