On Wed, September 27, 2006 11:38 am, Nels Lindquist said:
> Daniel T. Staal wrote:
>
>> On Wed, September 27, 2006 11:10 am, Jim Maul said:
>>
>>> I believe that SA will not learn a message it has seen before so
>>> multiple sa-learn's will not have any affect.
>>
>> Actually, that was my impression too.
>>
>> Which means, for the orginal question, that re-learning the already
>> caught spams will have very little effect other than wasting some
>> processor cycles.  Doing what he is doing right now is probably best.
>
> Except that there's a significant difference between "already caught" and
> "already learned" spam.  The threshold for learning is much higher (and
> has specific requirements WRT point contributions of various types) so
> it's definitely possible to have, for example, a message that was
> correctly flagged as spam entirely due to network tests that was not
> auto-learned.  Training such messages then reinforces Bayes on the
> content side, so future messages that look similar but perhaps have a new
> URL that hasn't hit the blacklists yet can still be flagged.

True.  So...  Optimal is obviously to train, once and correctly, on all
messages.  Sending a message through that has been trained will consume
*some* resources, but less then one that still needs to be learned.

So the exact balance is a complicated question.  ;)

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------

Reply via email to