-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 16/10/11 17:09, Jesus Cea wrote: > After a while the detection rate goes worse, slowly. And training > get slower (the probability change after a training cycle moves far > slower).
One point stressed frequently is that the number of ham/spams trained should be similar. In my case I am not doing that. My current database numbers are: HAM: 150 SPAM: 27919 The counters are so unbalanced because: 1. I only train misclasifications and "unsures". The fact is that misclassifications are rare (thanks!) and >99% of "unsures" are spam. 2. When I train over a message, I keep training in a loop until the message probability goes under 20% (ham) or over 90% (spam). As the database ages, training spam needs more "looping", that is, the probability goes up slowly. The ham training, nevertheless, is fast and the loop counting is low. Suggestions? - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ j...@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:j...@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTpr4nJlgi5GaxT1NAQKTiAP+MfyHr2cY7i64dNSex+6OmSgmVNwXPNwk 3mpMC3if3HNNj0RgsZxZA5PjqMn07KISgZ7vVLXuLYmS3WNq2tUqM2nLevaa6g3N YTrOCbUWmfnvAfg9KiU0YebMn4SLHOeqNJEZyCd6Pbz6lclH4aQuOdKUSdg4F8rB AsCH0LE8wVE= =vy3L -----END PGP SIGNATURE----- _______________________________________________ SpamBayes@python.org http://mail.python.org/mailman/listinfo/spambayes Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html