Joe Emenaker wrote: > Ole Nomann Thomsen wrote: > >>Hi, I noticed that words, common to all mails, seem to get at spamvalue of >>close to zero, as in:
[...] > When you think about it, this makes a little more sense. You want to be > able to scan a message, take a word from the message and, using that > word, estimate the odds that the message is spam. If you receive 99 hams > for each spam you get, then looking up "Subject" would tell you that > there's a 1% chance that the message is spam. > > Get it? > > - Joe Yes, that makes good sence. Theres a minor catch, though: The thresholds for automatic training (and the general unpredictability of manual training) will skew the result. Say I receive 30% ham/70% spam, but I get trained by 60% ham/40% spam. The true probability of a letter containing "Subject" being a spam should then be 70%, but the training will put this probability at 40%. I wonder if it's a problem, and why not?
