Dan,

Friday, February 20, 2004, 1:24:58 AM, you wrote:

> http://bugzilla.spamassassin.org/show_bug.cgi?id=3071
> [EMAIL PROTECTED] changed:
>            What    |Removed                     |Added
----------------------------------------------------------------------------
>              Status|NEW                         |RESOLVED
>          Resolution|                            |WONTFIX

> ------- Additional Comments From [EMAIL PROTECTED]  2004-02-20 01:24 -------
> We have some other bugs open for Bayes poison.  This will FP really badly,
> especially on non-English texts (including stuff like programs and non-prose).

Agreed.  I found these rules as posted had many ham hits on my system --
I've been working through those, and currently use these rules as:
body     AR_WORDLIST_10   
/(?:\b(?!(?:about|each|from|have|into|like|more|some|tha[nt]|the[ny]|this|very|wh?ere|which|will|with|your)\b)[a-z]{4,12}\s+){10}/
describe AR_WORDLIST_10   string of 10+ random words
score    AR_WORDLIST_10   2.000  # type=max:2.0 - 13135s/3h of 100795 corpus 
(82099s/18696h) 02/16/04
                                 # ham: verified (3)
body     AR_WORDLIST_13   
/(?:\b(?!(?:about|each|from|have|into|like|more|some|tha[nt]|the[ny]|this|very|wh?ere|which|will|with|your)\b)[a-z]{4,12}\s+){13}/
describe AR_WORDLIST_13   string of 13+ random words
score    AR_WORDLIST_13   3.000  # 12497s/1h of 100795 corpus (82099s/18696h) 
02/16/04
                                 # ham: email address list
body     AR_WORDLIST_18   
/(?:\b(?!(?:about|each|from|have|into|like|more|some|tha[nt]|the[ny]|this|very|wh?ere|which|will|with|your)\b)[a-z]{4,12}\s+){18}/
describe AR_WORDLIST_18   string of 18+ random words
score    AR_WORDLIST_18   7.650  # type=spamg - 11130s/0h of 100795 corpus 
(82099s/18696h) 02/16/04

> We have one rule in testing right now which looks at word distributions to
> detect random words -- it works quite well.

Look forward to that enhancement.  Thanks.

Bob Menschel



Reply via email to