-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>> >Here is another question about the bayes numbers. Assuming no funny >> >business with merging or something like that, a particular tokens >> >spam/ham count must be <= the number of spam/ham messages learned >> >right? >> > >> >So, if you've only learned 100000 spams, but a token has a spam count >> >of 1+ billion, there is probably something wrong, right? >> >> yep. Would suggest skipping such a token during reload in that case. > >Why do we do this? According to Paul Graham's site, we'd get better >results if the number of occurences in the DB was the total number of >occurences not the number of messages it which the token was seen. if I recall correctly the spambayes guys analysed this and found it to be untrue -- or at least an artifact of his method. - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFAO9GQQTcbUG5Y7woRAqjZAJ9GGrYz8zHUXVPRVbSwhBSaVJvDtwCdGjKh NmSAQiVBaOrY5R8moDuxAu4= =uWoZ -----END PGP SIGNATURE-----
