http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From [EMAIL PROTECTED]  2004-03-12 15:10 -------
> I'm concerned that [A-Za-z] and \w are too locale-specific, so I'd like
> to figure out exactly why they improve results so much over \S.

HTML like this, where punctuation follows something in an anchor, is
probably fairly common:

<a href="mailto:[EMAIL PROTECTED]"><u>[EMAIL PROTECTED]</a></u>;

so that might be why it reduces false positives (\S would match the
semicolon).  I guess the use of [A-Za-z] instead of \S would reduce
the number of true positives in non-Roman messages, but most spam
(that I receive) uses Roman characters so I'm not sure why you are
surprised at the improvement it does get.  Maybe I misunderstood the
comment.

The reduction in the number of hits on spam messages in the tables
above is probably due to false positives in those spam messages that
do not contain obfuscation.  Perhaps the false positives in ham
messages can be reduced further, I'm going to look for ham that the
rules hit so they can be tweaked.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to