On 1/15/2013 4:39 PM, Bowie Bailey wrote: > On 1/15/2013 4:27 PM, Ben Johnson wrote: >> On 1/15/2013 4:05 PM, Bowie Bailey wrote: >>> On 1/15/2013 3:47 PM, Ben Johnson wrote: >>>> One final question on this subject (sorry...). >>>> >>>> Is there value in training Bayes on messages that SA classified as spam >>>> *due to other test scores*? In other words, if a message is classified >>>> as SPAM due to a block-list test, but the message is new enough for >>>> Bayes to assign a zero score, should that message be kept and fed to >>>> sa-learn so that Bayes can soak-up all the tokens from a message >>>> that is >>>> almost certainly spam (based on the other tests)? >>>> >>>> Am I making any sense? >>> It is always worthwhile to train Bayes. In an ideal world, you would >>> hand-sort and train every email that comes through your system. The >>> more mail Bayes sees the more accurate it can be. >>> >> Thanks, Bowie. Given your response, would it then be prudent to call >> "sa-learn --spam" on any message that *other tests* (non-Bayes tests) >> determine to be spam (given some score threshold)? > > That is exactly what the autolearn setting does. I let my system run > with the default autolearn settings. Some people adjust the thresholds > and some people prefer to turn off autolearn and do purely manual training. > >> The crux of my question/point is that I don't want to have to feed >> messages that Bayes "misses" but that other tests identify *correctly* >> as spam to "sa-learn --spam". > > At one point, I had a script running on my server that looked for > messages that were marked as spam with a low Bayes rating (BAYES_00 to > BAYES_40) or messages marked as ham with a high Bayes rating (BAYES_60 > to BAYES_99). I was then able to check the messages and learn them > properly. This let me learn from the edge cases that were not being > scored properly by Bayes while still making it to the correct folder due > to other rules. > > If you do this, you MUST check the messages yourself prior to learning > since there is no other way to know whether they should be learned as > ham or spam. > >> Is there value in implementing something like this? Or is there some >> caveat that would make doing so self-defeating? > > I find that Bayes autolearn works quite well for me, but others have had > problems with it. >
Aaaaah... I get it. Finally. :) Excellent info here; thanks again! You guys are heroes... seriously. Best regards, -Ben