On 1/15/2013 4:39 PM, Bowie Bailey wrote:
> On 1/15/2013 4:27 PM, Ben Johnson wrote:
>> On 1/15/2013 4:05 PM, Bowie Bailey wrote:
>>> On 1/15/2013 3:47 PM, Ben Johnson wrote:
>>>> One final question on this subject (sorry...).
>>>>
>>>> Is there value in training Bayes on messages that SA classified as spam
>>>> *due to other test scores*? In other words, if a message is classified
>>>> as SPAM due to a block-list test, but the message is new enough for
>>>> Bayes to assign a zero score, should that message be kept and fed to
>>>> sa-learn so that Bayes can soak-up all the tokens from a message
>>>> that is
>>>> almost certainly spam (based on the other tests)?
>>>>
>>>> Am I making any sense?
>>> It is always worthwhile to train Bayes.  In an ideal world, you would
>>> hand-sort and train every email that comes through your system.  The
>>> more mail Bayes sees the more accurate it can be.
>>>
>> Thanks, Bowie. Given your response, would it then be prudent to call
>> "sa-learn --spam" on any message that *other tests* (non-Bayes tests)
>> determine to be spam (given some score threshold)?
> 
> That is exactly what the autolearn setting does.  I let my system run
> with the default autolearn settings.  Some people adjust the thresholds
> and some people prefer to turn off autolearn and do purely manual training.
> 
>> The crux of my question/point is that I don't want to have to feed
>> messages that Bayes "misses" but that other tests identify *correctly*
>> as spam to "sa-learn --spam".
> 
> At one point, I had a script running on my server that looked for
> messages that were marked as spam with a low Bayes rating (BAYES_00 to
> BAYES_40) or messages marked as ham with a high Bayes rating (BAYES_60
> to BAYES_99).  I was then able to check the messages and learn them
> properly.  This let me learn from the edge cases that were not being
> scored properly by Bayes while still making it to the correct folder due
> to other rules.
> 
> If you do this, you MUST check the messages yourself prior to learning
> since there is no other way to know whether they should be learned as
> ham or spam.
> 
>> Is there value in implementing something like this? Or is there some
>> caveat that would make doing so self-defeating?
> 
> I find that Bayes autolearn works quite well for me, but others have had
> problems with it.
> 

Aaaaah... I get it. Finally. :)

Excellent info here; thanks again!

You guys are heroes... seriously.

Best regards,

-Ben

Reply via email to