-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>> >Here is another question about the bayes numbers.  Assuming no funny
>> >business with merging or something like that, a particular tokens
>> >spam/ham count must be <= the number of spam/ham messages learned
>> >right?
>> >
>> >So, if you've only learned 100000 spams, but a token has a spam count
>> >of 1+ billion, there is probably something wrong, right?
>> 
>> yep.  Would suggest skipping such a token during reload in that case.
>
>Why do we do this? According to Paul Graham's site, we'd get better
>results if the number of occurences in the DB was the total number of
>occurences not the number of messages it which the token was seen.

if I recall correctly the spambayes guys analysed this and found it
to be untrue -- or at least an artifact of his method.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAO9GQQTcbUG5Y7woRAqjZAJ9GGrYz8zHUXVPRVbSwhBSaVJvDtwCdGjKh
NmSAQiVBaOrY5R8moDuxAu4=
=uWoZ
-----END PGP SIGNATURE-----

Reply via email to