Joe Emenaker wrote:

> Ole Nomann Thomsen wrote:
> 
>>Hi, I noticed that words, common to all mails, seem to get at spamvalue of
>>close to zero, as in:

[...]

> When you think about it, this makes a little more sense. You want to be
> able to scan a message, take a word from the message and, using that
> word, estimate the odds that the message is spam. If you receive 99 hams
> for each spam you get, then looking up "Subject" would tell you that
> there's a 1% chance that the message is spam.
> 
> Get it?
> 
> - Joe

Yes, that makes good sence. Theres a minor catch, though: The thresholds for
automatic training (and the general unpredictability of manual training)
will skew the result. 

Say I receive 30% ham/70% spam, but I get trained by 60% ham/40% spam. The
true probability of a letter containing "Subject" being a spam should then
be 70%, but the training will put this probability at 40%.

I wonder if it's a problem, and why not?


Reply via email to