Re: seekrules over French spam (was Re: [Rule Set proposal] French Rules

John Wilcock Tue, 24 Jun 2008 04:51:44 -0700

Justin Mason a écrit :

John GALLET writes:
Well, thanks for writing it. I think its main weak point for French andother accented languages is handling the different encodings for a samechar with an accent, some kind of "synonyms" list. The same letter, say "awith an accent", can be misspelled with a plain "a", encoded in variouscharsets (latin, utf-8) to a "normal" à, or html encoded agrave (I left &and ; out). I do not know if it is possible at all, it might complicatethings *a lot*.
The tool can take care of this -- it will replace mutating single-characters
with a /./.  It also supports /.?/, /.{0,3}/, /.{0,10}/ and a few other
"any" patterns.


If the number of permutations is small (as would be the case for
accented letters and the equivalent unaccented ones, or for that matter
obfuscation with lookalike characters), wouldn't it be better for it to
replace the character by a [] list of those permutations (i.e. replace
something that mutates between e and é with [eé] or replace obfuscation
of i with l and 1 by [il1] ?

John.

--
-- Over 3000 webcams from ski resorts around the world - www.snoweye.com
-- Translate your technical documents and web pages    - www.tradoc.fr

Re: seekrules over French spam (was Re: [Rule Set proposal] French Rules

Reply via email to