-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 16/10/11 17:09, Jesus Cea wrote:
> After a while the detection rate goes worse, slowly. And training
> get slower (the probability change after a training cycle moves far
> slower).

One point stressed frequently is that the number of ham/spams trained
should be similar. In my case I am not doing that. My current database
numbers are:

HAM: 150
SPAM: 27919

The counters are so unbalanced because:

1. I only train misclasifications and "unsures". The fact is that
misclassifications are rare (thanks!) and >99% of "unsures" are spam.

2. When I train over a message, I keep training in a loop until the
message probability goes under 20% (ham) or over 90% (spam). As the
database ages, training spam needs more "looping", that is, the
probability goes up slowly. The ham training, nevertheless, is fast
and the loop counting is low.

Suggestions?

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
j...@jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:j...@jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTpr4nJlgi5GaxT1NAQKTiAP+MfyHr2cY7i64dNSex+6OmSgmVNwXPNwk
3mpMC3if3HNNj0RgsZxZA5PjqMn07KISgZ7vVLXuLYmS3WNq2tUqM2nLevaa6g3N
YTrOCbUWmfnvAfg9KiU0YebMn4SLHOeqNJEZyCd6Pbz6lclH4aQuOdKUSdg4F8rB
AsCH0LE8wVE=
=vy3L
-----END PGP SIGNATURE-----
_______________________________________________
SpamBayes@python.org
http://mail.python.org/mailman/listinfo/spambayes
Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to