On Wed, September 27, 2006 11:38 am, Nels Lindquist said: > Daniel T. Staal wrote: > >> On Wed, September 27, 2006 11:10 am, Jim Maul said: >> >>> I believe that SA will not learn a message it has seen before so >>> multiple sa-learn's will not have any affect. >> >> Actually, that was my impression too. >> >> Which means, for the orginal question, that re-learning the already >> caught spams will have very little effect other than wasting some >> processor cycles. Doing what he is doing right now is probably best. > > Except that there's a significant difference between "already caught" and > "already learned" spam. The threshold for learning is much higher (and > has specific requirements WRT point contributions of various types) so > it's definitely possible to have, for example, a message that was > correctly flagged as spam entirely due to network tests that was not > auto-learned. Training such messages then reinforces Bayes on the > content side, so future messages that look similar but perhaps have a new > URL that hasn't hit the blacklists yet can still be flagged.
True. So... Optimal is obviously to train, once and correctly, on all messages. Sending a message through that has been trained will consume *some* resources, but less then one that still needs to be learned. So the exact balance is a complicated question. ;) Daniel T. Staal --------------------------------------------------------------- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---------------------------------------------------------------