Re: Bayes autolearn questions
On 09/09/2014 03:50 PM, Alex Regan wrote: Hi, Did you understand that all tokens are learned, regardless whether they have been seen before? That doesn't really matter from a user perspective, though, right? I mean, if there are tokens that have already been learned are learned again, the net result is zero. Very much not zero. Each token has several values assocated with it: # ham # spam time-stamp So each time it's learned its respective ham/spam counter is incremented which indicates how spammy or hammy a given token is and its time-stamp is updated indicating how "fresh" a token is. The bayes expiry process removes "stale" tokens when it does its job to prune the database down to size. Ah, yes, of course. I knew about that, but somehow didn't put it together with this. I would like to know why, after training similar messages a number of times, it still shows the same bayes score on new similar messages. I'd also like to figure out why or how many more times it's necessary for a message to be re-trained to reflect the new desired persuasion. I've had this particular FN with frequently a bayes50, sometimes lower, that also have a few dozen every day that are tagged as spam properly, but still have bayes50. I pull them out of the quarantine and keep training them as spam, but there's still a few that get through every day. Is there any particular analysis I can do on one of the FNs that can tell me how far off the bayes50 is from becoming bayes99 in a similar message? Hopefully that's clear. I understand there's a large number of variables involved here, and I would think the fewer number of tokens in a message, the more difficult it probably should be to persuade, but it's frustrating to see bayes50 so repeatedly... you could add report BAYES_HT _HAMMYTOKENS(50)_ report BAYES_ST _SPAMMYTOKENS(50)_ to your local.cf to add a header report & see what tokens are being seen
Re: Bayes autolearn questions
Hi, Did you understand that all tokens are learned, regardless whether they have been seen before? That doesn't really matter from a user perspective, though, right? I mean, if there are tokens that have already been learned are learned again, the net result is zero. Very much not zero. Each token has several values assocated with it: # ham # spam time-stamp So each time it's learned its respective ham/spam counter is incremented which indicates how spammy or hammy a given token is and its time-stamp is updated indicating how "fresh" a token is. The bayes expiry process removes "stale" tokens when it does its job to prune the database down to size. Ah, yes, of course. I knew about that, but somehow didn't put it together with this. I would like to know why, after training similar messages a number of times, it still shows the same bayes score on new similar messages. I'd also like to figure out why or how many more times it's necessary for a message to be re-trained to reflect the new desired persuasion. I've had this particular FN with frequently a bayes50, sometimes lower, that also have a few dozen every day that are tagged as spam properly, but still have bayes50. I pull them out of the quarantine and keep training them as spam, but there's still a few that get through every day. Is there any particular analysis I can do on one of the FNs that can tell me how far off the bayes50 is from becoming bayes99 in a similar message? Hopefully that's clear. I understand there's a large number of variables involved here, and I would think the fewer number of tokens in a message, the more difficult it probably should be to persuade, but it's frustrating to see bayes50 so repeatedly... Thanks, Alex
Re: Bayes autolearn questions
Hi, Please use plain-text rather than HTML. In particular with that really bad indentation format of quoting. It doesn't seem possible with gmail directly any longer, so I've set up thunderbird for this. Maybe it is, but not after clicking around in the obvious places. It's possible. A little googling reveals how: When composing a message (or reply), click the little downward-facing triangle on the bottom right of the compose box (next to the trash can). From the pop-up menu, click "plain text mode." Haven't tried it personally, but seems like it should work as advertised. That looks like it, thanks. I figured it would be with the rest of the fonts and formatting section. It's a per-message thing, though. It was just easier to set up Thunderbird anyway. Thanks, Alex
Re: Bayes autolearn questions
On Sep 8, 2014, at 7:17 PM, Alex Regan wrote: >> Please use plain-text rather than HTML. In particular with that really >> bad indentation format of quoting. > > It doesn't seem possible with gmail directly any longer, so I've set up > thunderbird for this. Maybe it is, but not after clicking around in the > obvious places. It's possible. A little googling reveals how: When composing a message (or reply), click the little downward-facing triangle on the bottom right of the compose box (next to the trash can). From the pop-up menu, click "plain text mode." Haven't tried it personally, but seems like it should work as advertised. --- Amir
Re: Bayes autolearn questions
On Mon, 8 Sep 2014, Alex Regan wrote: Did you understand that the number of previously not seen tokens has absolutely nothing to do with auto-learning? Yes, that was a mistake. Did you understand that all tokens are learned, regardless whether they have been seen before? That doesn't really matter from a user perspective, though, right? I mean, if there are tokens that have already been learned are learned again, the net result is zero. Very much not zero. Each token has several values assocated with it: # ham # spam time-stamp So each time it's learned its respective ham/spam counter is incremented which indicates how spammy or hammy a given token is and its time-stamp is updated indicating how "fresh" a token is. The bayes expiry process removes "stale" tokens when it does its job to prune the database down to size. Thus learning a token multiple times increases its weight and keeps it "fresh" so it is kept as an active/relevant piece of info. -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
Re: Bayes autolearn questions
Hi, Please use plain-text rather than HTML. In particular with that really bad indentation format of quoting. It doesn't seem possible with gmail directly any longer, so I've set up thunderbird for this. Maybe it is, but not after clicking around in the obvious places. X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16. Isn't that sufficient for auto-learning this message as spam? That's clearly referring to the _TOKEN_ data in the custom header, is it not? Yes. Burning the candle at both ends. Really overworked. Sorry to hear. Nonetheless, did you take the time to really understand my explanations? It seems you sometimes didn't in the past, and I am not happy to waste my time on other people's problems if they aren't following thoroughly. Yes, always. It may not be immediately, but the time you give up to do this is not lost on me. My brain sometimes goes faster than I can explain myself properly. I make too many assumptions about what people understand about me, my abilities, and my comprehension of a topic. Learning is not limited to new tokens. All tokens are learned, regardless their current (h|sp)ammyness. Still, the number of (new) tokens is not a condition for auto-learning. That header shows some more or less nice information, but in this context absolutely irrelevant information. I understood "new" to mean the tokens that have not been seen before, and would be learned if the other conditions were met. Well, yes. So what? Did you understand that the number of previously not seen tokens has absolutely nothing to do with auto-learning? Yes, that was a mistake. Did you understand that all tokens are learned, regardless whether they have been seen before? That doesn't really matter from a user perspective, though, right? I mean, if there are tokens that have already been learned are learned again, the net result is zero. This whole part is entirely unrelated to auto-learning and your original question. Yes, I see that, and much of it comes to not explaining myself properly originally. I really only meant to tie it in with the tokens that would be learned had it been determined that autolearning would be taking place. I understand now that all the tokens are learned always anyway. As I have mentioned before in this thread: It is NOT the message's reported total score that must exceed the threshold. The auto-learning discriminator uses an internally calculated score using the respective non-Bayes scoreset. Very helpful, thanks. Is there a way to see more about how it makes that decision on a particular message? spamassassin -D learn Unsurprisingly, the -D debug option shows information on that decision. In this case limiting debug output to the 'learn' area comes in handy, eliminating the noise. The output includes the important details like auto-learn decision with human readable explanation, score computed for autolearn as well as head and body points. It's been a long time since I've gone through the debug output for bayes info, but I have done that. Only now, I'll have a little better understanding of what it means, and can start to improve my overall understanding of the bayes component of spamassassin. Hopefully others also benefited from this crazy thread as much as I did. Thanks, Alex
Re: Bayes autolearn questions
Please use plain-text rather than HTML. In particular with that really bad indentation format of quoting. On Sat, 2014-09-06 at 17:22 -0400, Alex wrote: > On Thu, Sep 4, 2014 at 1:44 PM, Karsten Bräckelmann wrote: > > On Wed, 2014-09-03 at 23:50 -0400, Alex wrote: > > > > > > > I looked in the quarantined message, and according to the _TOKEN_ > > > > > header I've added: > > > > > > > > > > X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16. > > > > > > > > > > Isn't that sufficient for auto-learning this message as spam? > > > > That's clearly referring to the _TOKEN_ data in the custom header, is it > > not? > > Yes. Burning the candle at both ends. Really overworked. Sorry to hear. Nonetheless, did you take the time to really understand my explanations? It seems you sometimes didn't in the past, and I am not happy to waste my time on other people's problems if they aren't following thoroughly. > > > > That has absolutely nothing to do with auto-learning. Where did you get > > > > the impression it might? > > > > > > If the conditions for autolearning had been met, I understood that it > > > would be those new tokens that would be learned. > > > > Learning is not limited to new tokens. All tokens are learned, > > regardless their current (h|sp)ammyness. > > > > Still, the number of (new) tokens is not a condition for auto-learning. > > That header shows some more or less nice information, but in this > > context absolutely irrelevant information. > > I understood "new" to mean the tokens that have not been seen before, and > would be learned if the other conditions were met. Well, yes. So what? Did you understand that the number of previously not seen tokens has absolutely nothing to do with auto-learning? Did you understand that all tokens are learned, regardless whether they have been seen before? This whole part is entirely unrelated to auto-learning and your original question. > > Auto-learning in a nutshell: Take all tests hit. Drop some of them with > > certain tflags, like the BAYES_xx rules. For the remaining rules, look > > up their scores in the non-Bayes scoreset 0 or 1. Sum up those scores to > > a total, and compare with the auto-learn threshold values. For spam, > > also check there are at least 3 points each by header and body rules. > > Finally, if all that matches, learn. > > Is it important to understand how those three points are achieved or > calculated? In most cases, no, I guess. Though that is really just a distinction usually easy to do based on the rule's type: header vs body-ish rule definitions. If the re-calculated total score in scoreset 0 or 1 exceeds the auto-learn threshold but the message still is not -- then it is important. Unless you trust the auto-learn discriminator to not cheat on you. > > > Okay, of course I understood the difference between points and tokens. > > > Since the points were over the specified threshold, I thought those > > > new tokens would have been added. > > > > As I have mentioned before in this thread: It is NOT the message's > > reported total score that must exceed the threshold. The auto-learning > > discriminator uses an internally calculated score using the respective > > non-Bayes scoreset. > > Very helpful, thanks. Is there a way to see more about how it makes that > decision on a particular message? spamassassin -D learn Unsurprisingly, the -D debug option shows information on that decision. In this case limiting debug output to the 'learn' area comes in handy, eliminating the noise. The output includes the important details like auto-learn decision with human readable explanation, score computed for autolearn as well as head and body points. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Bayes autolearn questions
Hi, On Thu, Sep 4, 2014 at 1:44 PM, Karsten Bräckelmann wrote: > On Wed, 2014-09-03 at 23:50 -0400, Alex wrote: > > > > > I looked in the quarantined message, and according to the _TOKEN_ > > > > header I've added: > > > > > > > > X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16. > > > > > > > > Isn't that sufficient for auto-learning this message as spam? > > That's clearly referring to the _TOKEN_ data in the custom header, is it > not? > Yes. Burning the candle at both ends. Really overworked. > > > That has absolutely nothing to do with auto-learning. Where did you get > > > the impression it might? > > > > If the conditions for autolearning had been met, I understood that it > > would be those new tokens that would be learned. > > Learning is not limited to new tokens. All tokens are learned, > regardless their current (h|sp)ammyness. > > Still, the number of (new) tokens is not a condition for auto-learning. > That header shows some more or less nice information, but in this > context absolutely irrelevant information. > I understood "new" to mean the tokens that have not been seen before, and would be learned if the other conditions were met. > Auto-learning in a nutshell: Take all tests hit. Drop some of them with > certain tflags, like the BAYES_xx rules. For the remaining rules, look > up their scores in the non-Bayes scoreset 0 or 1. Sum up those scores to > a total, and compare with the auto-learn threshold values. For spam, > also check there are at least 3 points each by header and body rules. > Finally, if all that matches, learn. > Is it important to understand how those three points are achieved or calculated? > > Okay, of course I understood the difference between points and tokens. > > Since the points were over the specified threshold, I thought those > > new tokens would have been added. > > As I have mentioned before in this thread: It is NOT the message's > reported total score that must exceed the threshold. The auto-learning > discriminator uses an internally calculated score using the respective > non-Bayes scoreset. > Very helpful, thanks. Is there a way to see more about how it makes that decision on a particular message? Thanks, Alex
Re: Bayes autolearn questions
On Wed, 2014-09-03 at 23:50 -0400, Alex wrote: > > > I looked in the quarantined message, and according to the _TOKEN_ > > > header I've added: > > > > > > X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16. > > > > > > Isn't that sufficient for auto-learning this message as spam? That's clearly referring to the _TOKEN_ data in the custom header, is it not? > > That has absolutely nothing to do with auto-learning. Where did you get > > the impression it might? > > If the conditions for autolearning had been met, I understood that it > would be those new tokens that would be learned. Learning is not limited to new tokens. All tokens are learned, regardless their current (h|sp)ammyness. Still, the number of (new) tokens is not a condition for auto-learning. That header shows some more or less nice information, but in this context absolutely irrelevant information. Auto-learning in a nutshell: Take all tests hit. Drop some of them with certain tflags, like the BAYES_xx rules. For the remaining rules, look up their scores in the non-Bayes scoreset 0 or 1. Sum up those scores to a total, and compare with the auto-learn threshold values. For spam, also check there are at least 3 points each by header and body rules. Finally, if all that matches, learn. > Okay, of course I understood the difference between points and tokens. > Since the points were over the specified threshold, I thought those > new tokens would have been added. As I have mentioned before in this thread: It is NOT the message's reported total score that must exceed the threshold. The auto-learning discriminator uses an internally calculated score using the respective non-Bayes scoreset. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Bayes autolearn questions
Hi, > > However, spam with scores greater than 9.0 aren't being autolearned: > > http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html > > > > Sep 2 21:01:51 mail01 amavis[25938]: (25938-10) > > header_edits_for_quar: -> > > , Yes, score=16.519 tag=-200 tag2=5 kill=5 > > tests=[BAYES_50=0.8, KAM_LAZY_DOMAIN_SECURITY=1, KAM_LINKBAIT=5, > > LOC_DOT_SUBJ=0.1, LOC_SHORT=3.1, RCVD_IN_BL_SPAMCOP_NET=1.347, > > RCVD_IN_BRBL_LASTEXT=1.449, RCVD_IN_PSBL=2.3, > > RCVD_IN_UCEPROTECT1=0.01, RCVD_IN_UCEPROTECT2=0.01, RDNS_NONE=0.793, > > RELAYCOUNTRY_CN=0.1, RELAYCOUNTRY_HIGH=0.5, SAGREY=0.01] autolearn=no > > autolearn_force=no > > > > I've re-read the autolearn section of the docs, > > The one I linked to above? Yes, and the FAQ entry regarding reasons why autolearn doesn't work. > I looked in the quarantined message, and according to the _TOKEN_ > header I've added: > > X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16. > > Isn't that sufficient for auto-learning this message as spam? > > That has absolutely nothing to do with auto-learning. Where did you get > the impression it might? If the conditions for autolearning had been met, I understood that it would be those new tokens that would be learned. > > I just wanted to be sure this is just a case of not enough new points > > (tokens?) for the message to be learned, and that I I wasn't doing > > something wrong. > > Points: aka score, used in the context of per-rule (per-test) and > overall score classifying a message based on the required_score setting. > > Token: think of it as "word" used by the Bayesian classifier sub-system. > In practice, it is more complicated than simply space separated words. > Context (e.x. headers) and case might be taken into account, too. Okay, of course I understood the difference between points and tokens. Since the points were over the specified threshold, I thought those new tokens would have been added. I'll continue reading and experimenting. Posting very late again. Thanks guys for your help, as always. Thanks, Alex
Re: Bayes autolearn questions
On Tue, 2014-09-02 at 21:16 -0600, LuKreme wrote: > On 02 Sep 2014, at 20:50 , Karsten Bräckelmann wrote: > > On Tue, 2014-09-02 at 20:22 -0600, LuKreme wrote: > >> I believe the score threshold is the base score WITHOUT bayes. > >> > >> Try running the email through with a -D flag and see what you get. > >> > >> (And that is only a partial answer, the threshold number ignores > >> certain classes of tests beyond bayes,but I don't remember which ones. > >> It's unfortunate that the learn_threshold_spam uses a number that > >> appears to be related to the spam score, because it isn't. > > > > It is. Using the accompanying, non-Bayes score-set. To avoid direct > > Bayes self-feeding, and other rules indirect self-feeding due to Bayes- > > enabled scores. > > > > BTW, if one knows of that mysterious (bayes_auto_) learn_threshold_spam > > you mentioned, one found the AutoLearnThreshold doc mentioning exactly > > that: Bayes auto-learning is based on non-Bayes scores. > > But that is not the case, You can have a score without bayes that > exceeds the threshold and still have the message not auto learned. True. I chose to not repeat myself highlighting the details and mentioning the constraint of header and body rules' points. See my other post half an hour earlier to this thread. And the docs. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Bayes autolearn questions
On 02 Sep 2014, at 20:50 , Karsten Bräckelmann wrote: > On Tue, 2014-09-02 at 20:22 -0600, LuKreme wrote: >> On 02 Sep 2014, at 19:11 , Alex wrote: >> >>> However, spam with scores greater than 9.0 aren't being autolearned: >> >> I believe the score threshold is the base score WITHOUT bayes. >> >> Try running the email through with a -D flag and see what you get. >> >> (And that is only a partial answer, the threshold number ignores >> certain classes of tests beyond bayes,but I don't remember which ones. >> It's unfortunate that the learn_threshold_spam uses a number that >> appears to be related to the spam score, because it isn't. > > It is. Using the accompanying, non-Bayes score-set. To avoid direct > Bayes self-feeding, and other rules indirect self-feeding due to Bayes- > enabled scores. > > BTW, if one knows of that mysterious (bayes_auto_) learn_threshold_spam > you mentioned, one found the AutoLearnThreshold doc mentioning exactly > that: Bayes auto-learning is based on non-Bayes scores. But that is not the case, You can have a score without bayes that exceeds the threshold and still have the message not auto learned. -- 'They're the cream!' Rincewind sighed. 'Cohen, they're the cheese.'
Re: Bayes autolearn questions
On Tue, 2014-09-02 at 20:22 -0600, LuKreme wrote: > On 02 Sep 2014, at 19:11 , Alex wrote: > > > However, spam with scores greater than 9.0 aren't being autolearned: > > I believe the score threshold is the base score WITHOUT bayes. > > Try running the email through with a -D flag and see what you get. > > (And that is only a partial answer, the threshold number ignores > certain classes of tests beyond bayes,but I don't remember which ones. > It's unfortunate that the learn_threshold_spam uses a number that > appears to be related to the spam score, because it isn't. It is. Using the accompanying, non-Bayes score-set. To avoid direct Bayes self-feeding, and other rules indirect self-feeding due to Bayes- enabled scores. BTW, if one knows of that mysterious (bayes_auto_) learn_threshold_spam you mentioned, one found the AutoLearnThreshold doc mentioning exactly that: Bayes auto-learning is based on non-Bayes scores. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Bayes autolearn questions
On 02 Sep 2014, at 19:11 , Alex wrote: > However, spam with scores greater than 9.0 aren't being autolearned: I believe the score threshold is the base score WITHOUT bayes. Try running the email through with a -D flag and see what you get. (And that is only a partial answer, the threshold number ignores certain classes of tests beyond bayes,but I don't remember which ones. It's unfortunate that the learn_threshold_spam uses a number that appears to be related to the spam score, because it isn't. -- It's like a cow's opinion. It just doesn't matter. It's moo
Re: Bayes autolearn questions
On Tue, 2014-09-02 at 21:11 -0400, Alex wrote: > I have a spamassassin-3.4 system with the following bayes config: > > required_hits 5.0 > rbl_timeout 8 > use_bayes 1 > bayes_auto_learn 1 > bayes_auto_learn_on_error 1 > bayes_auto_learn_threshold_spam 9.0 > bayes_expiry_max_db_size 950 > bayes_auto_expire 0 > > However, spam with scores greater than 9.0 aren't being autolearned: http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html > Sep 2 21:01:51 mail01 amavis[25938]: (25938-10) > header_edits_for_quar: -> > , Yes, score=16.519 tag=-200 tag2=5 kill=5 > tests=[BAYES_50=0.8, KAM_LAZY_DOMAIN_SECURITY=1, KAM_LINKBAIT=5, > LOC_DOT_SUBJ=0.1, LOC_SHORT=3.1, RCVD_IN_BL_SPAMCOP_NET=1.347, > RCVD_IN_BRBL_LASTEXT=1.449, RCVD_IN_PSBL=2.3, > RCVD_IN_UCEPROTECT1=0.01, RCVD_IN_UCEPROTECT2=0.01, RDNS_NONE=0.793, > RELAYCOUNTRY_CN=0.1, RELAYCOUNTRY_HIGH=0.5, SAGREY=0.01] autolearn=no > autolearn_force=no > > I've re-read the autolearn section of the docs, The one I linked to above? > and don't see any reason why this 16-point email wouldn't have any new > tokens to be learned? Rules with certain tflags are ignored when determining whether a message should be trained upon. Most notably here BAYES_xx. Moreover, the auto-learning decision occurs using scores from either scoreset 0 or 1, that is using scores of a non-Bayes scoreset. IOW the message's score of 16 is irrelevant, since the auto-learn algorithm uses different scores per rule. Next safety net is requiring at least 3 points each from header and body rules, unless autolearn_force is enabled. Which it is not in your sample. Either of those could have prevented auto-learning. Also, according to your wording, you seem to think in terms of (number of) "new tokens to be learned". Which has nothing in common with auto-learning. (Even worse, "new tokens" would strongly apply to random gibberish strings, hapaxes in Bayes context. Which are commonly ignored in Bayes classification.) > I looked in the quarantined message, and according to the _TOKEN_ > header I've added: > > X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16. > > Isn't that sufficient for auto-learning this message as spam? That has absolutely nothing to do with auto-learning. Where did you get the impression it might? > I just wanted to be sure this is just a case of not enough new points > (tokens?) for the message to be learned, and that I I wasn't doing > something wrong. Points: aka score, used in the context of per-rule (per-test) and overall score classifying a message based on the required_score setting. Token: think of it as "word" used by the Bayesian classifier sub-system. In practice, it is more complicated than simply space separated words. Context (e.x. headers) and case might be taken into account, too. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Bayes autolearn questions
Hi, I have a spamassassin-3.4 system with the following bayes config: required_hits 5.0 rbl_timeout 8 use_bayes 1 bayes_auto_learn 1 bayes_auto_learn_on_error 1 bayes_auto_learn_threshold_spam 9.0 bayes_expiry_max_db_size 950 bayes_auto_expire 0 However, spam with scores greater than 9.0 aren't being autolearned: Sep 2 21:01:51 mail01 amavis[25938]: (25938-10) header_edits_for_quar: < bmu011...@bmu-011.hichina.com> -> , Yes, score=16.519 tag=-200 tag2=5 kill=5 tests=[BAYES_50=0.8, KAM_LAZY_DOMAIN_SECURITY=1, KAM_LINKBAIT=5, LOC_DOT_SUBJ=0.1, LOC_SHORT=3.1, RCVD_IN_BL_SPAMCOP_NET=1.347, RCVD_IN_BRBL_LASTEXT=1.449, RCVD_IN_PSBL=2.3, RCVD_IN_UCEPROTECT1=0.01, RCVD_IN_UCEPROTECT2=0.01, RDNS_NONE=0.793, RELAYCOUNTRY_CN=0.1, RELAYCOUNTRY_HIGH=0.5, SAGREY=0.01] autolearn=no autolearn_force=no I've re-read the autolearn section of the docs, and don't see any reason why this 16-point email wouldn't have any new tokens to be learned? I looked in the quarantined message, and according to the _TOKEN_ header I've added: X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16. Isn't that sufficient for auto-learning this message as spam? I just wanted to be sure this is just a case of not enough new points (tokens?) for the message to be learned, and that I I wasn't doing something wrong. Thanks, Alex