[Neil Schemenauer] > I have an idea for a spambayes variation that should be more suited > to multi-user systems. The goal is to make the DB somewhat > conditionalized based on recipient address. In addition to storing > <token>, spambayes could also save (<recipient>, <token>). When > scoring a message, the probability for (<recipient>, <token>) would > be added to the evidence as well as for <token>.
Offhand I think it would make more sense to ignore <token> when a (<recipient>, <token>) pair (for the same <token> and the given <recipient>) is known. For example, if a urologist trains on "penis" as ham, it's not doing him a favor to fold in that it's spam to almost everyone else. > I'm looking at chi2_spamprob() and wondering if this is valid, > statistics-wise. There's really no sense in which chi2_spamprob() computes "a probability" -- it works or it doesn't. Heh. > Is there some better way to include the (<recipient>, <token>) evidence? Test some ;-) > BTW, if this idea actually works, using (<sender>, <token>) may also > be helpful. Spam sender addresses typically change rapidly, while ham sender addresses typically don't. So I expect this would add major boosts to the tokens sent by ham senders, and typically create a ton of hapaxes from spam senders (due to the spam <sender> addresses constantly changing). _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev