Re: A very long spam

Thomas Arend 9 Jan 2005 11:18:13 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am Samstag, 8. Januar 2005 22:55 schrieb Fajar Priyanto:
> On Sunday 09 January 2005 04:47 am, Matt Kettler wrote:
[..]


> > Train spam as spam, train ham as ham. Let the statistics deal with the
> > overlap. By trying to avoid training "spamish" ham or "hamish" spam
> > you're just doing your training a big disservice by making it
> > unrealistic.
>
> Thanks Matt,
> So talking statistically, does it mean I have to train SA about 'ham' as
> many as 'spam'? Right now, I train SA mostly on spams.

You must train ham and spam. How should the Bayes filter now what is ham if 
you didn't train it?

As far as I understand the Bayes filter searches for tokens in the email. If a 
token was found in 30 spam and 10 ham mails then the propability for being 
spam is 75%. But if you only train spam the Bayes filter would say: if have 
learned 30 spam mails but no ham so the propability for being spam is 100%.

(The bayes calculation is done with some ham/spam tokens. How many tokens are 
taken into account I don't know)

If you only / mostly train spam this will poison your database and the 
FalsePositves will grow. To keep FalsePositive low, you should teach SA all 
ham.

Its unlikely to train as much ham as spam because there is more spam. But this 
is no harm. The Bayesian filter work on tokens found. Lets assume you have 
teached 200 spam and 100 ham. 100 spam and 100 ham contained the token x. If 
x is found in an new message, then the spam prob is 50% even if the 
propability of being in a ham message is 100%.

If you teach only half the ham messages the spam-ham ratio would be 100 to 50 
which gives a propability of 66% for being spam. 


Regards

Thomas

- -- 
icq:133073900
http://www.t-arend.de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFB4RLeHe2ZLU3NgHsRAjgSAKCHYwQWLMJExHdtrgb0OLXHHy00XwCeKIyw
Y7oZeRBZ22sOlpZFmc5Ln7M=
=i9Cw
-----END PGP SIGNATURE-----

Re: A very long spam

Reply via email to