Hi all, is there someone who has a good grasp around the mathematics of Bayes learning with respect to SpamAssassin?
I assume that training a fresh BayesStore with a set of spam and ham samples is mathematically sound. What bothers me a little is the expiration logic. The purpose of expiration seems to be a practical one, we don't want the BayesStore grow too much. But is there a conceptual counterpart? One such concept could be: Maintain the store as if it were trained from scratch with spam and ham mails up to N days into the past. However if I am not mistaken, that is not the implementation. The nspam and nham magic counters mostly only increase. They will decrease when a message is forgotten or relearnt, but they will not decrease on expiration. If I am not mistaken there are conceptual differences between some BayesStore implementations. PgSQL will expire tokens if configured, but it will not expire seen messages. Redis on the other hand expires both tokens and seen messages (with a huge ttl difference between those two in the default configuration, on top of that). As a result, after some time, probably most BayesStores are in a state for which there is no mail-sample set that can lead to said state via training. Can such state still lead to statistically valid conclusions? Can both implementations be correct? Damian