Re: SA-learn (spamassassin)
On 03.08.09 09:43, monolit wrote: If you are so clever (because I am bad english speaker) you can explain me this problematics in my mail(po slovensky). Its problem for you? I didnt enough good materials about this theme in czech language. Well, - do not train on fake messages. - do not modify bayes_min_*_num - train on much spam and on much ham. you can apparently keep the trained corpus somewhere for later revision (if you incorrectly trained anything) or Bayes DB refill if you loose it. I don't know if there's anything to translate. you should understand that in any mail there are many tokens you don't even notice. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. How does cat play with mouse? cat /dev/mouse
Re: SA-learn (spamassassin)
On Tue, 4 Aug 2009 14:39:44 +0200, Matus UHLAR - fantomas uh...@fantomas.sk wrote: On 03.08.09 09:43, monolit wrote: If you are so clever (because I am bad english speaker) you can explain me this problematics in my mail(po slovensky). Its problem for you? I didnt enough good materials about this theme in czech language. Well, - do not train on fake messages. why not ? - do not modify bayes_min_*_num why not ? - train on much spam and on much ham. bayes just need to know as much samples on what is spam and what is not spam, keep nham, nspam nearly equal is sign of good training imho you can apparently keep the trained corpus somewhere for later revision (if you incorrectly trained anything) or Bayes DB refill if you loose it. well it might be good advise but spam changes every day, also why spammers have big succes pass in spamassassin I don't know if there's anything to translate. maybe not you should understand that in any mail there are many tokens you don't even notice. thats the point with bayes -- Benny Pedersen
Re: SA-learn (spamassassin)
Good morning. The output of sa-learn --dump magic after bayes learning is +1 nspam/nham. I tried the command several. times. I tried write the mail with Subject: viagra; body: viagra and sent it from my first account to the my second account(score 0,4). Then I used sa-learn -spam for this mail. I wrote the same mail and sent it from account one to the second. The mail gain higher score 2.4. I took this mail and used sa-learn -spam. I wrote the same mail and repeat the sending(From 1. account to the second). The score was again higher 3.4. I tried it still several times but the score didnt grow... Thats was my small experiment with scoring by bayes. My spamd process run under root. I started sa-learn under root. BUT the database is in /root directory and the same database is in /home/spamfilter directory. Spamfilter is user which is state in master.cf. In spamassassin (local.cf) I have record for the bayes database and the path is /home/spamfilter... When I started sa-learn under root so I check time of updating database. The database under user spamfilter is correctly updated(under root isnt updated). I know it is strange and confusing ...use two user for this. I wish all function and so on ran under one user, but I dont know how start up spamd under spamfilter. I am not sure if is it the right... maybe spamd should running under root. Here is my modification from master.cf(postfix). This modification is recommended by spamassassin www pages. smtp inet n - n - - smtpd -o content_filter=spamfilter:dummy # Interfaces to non-Postfix software. Be sure to examine the manual # pages of the non-Postfix software to find out what options it wants. # spamfilter unix - n n - - pipe flags=Rq user=spamfilter argv=/usr/local/bin/spamfilter -f ${sender} -- ${recipient} Thank you for explanation how bayes works and for time which you devoted to me. -- View this message in context: http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24786173.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: SA-learn (spamassassin)
On Sun, 2009-08-02 at 23:50 -0700, monolit wrote: Good morning. The output of sa-learn --dump magic after bayes learning is +1 nspam/nham. I tried the command several. times. I tried write the mail with I did not ask for the difference. I asked for the output of the command. Subject: viagra; body: viagra and sent it from my first account to the my second account(score 0,4). Then I used sa-learn -spam for this mail. I wrote As I told you before, there are *lots* of other tokens. Which differ greatly between your self-written messages and spam. Measuring Bayes by observing a single token is broken. the same mail and sent it from account one to the second. The mail gain higher score 2.4. I took this mail and used sa-learn -spam. I wrote the same mail and repeat the sending(From 1. account to the second). The score was again higher 3.4. I tried it still several times but the score didnt grow... Thats was my small experiment with scoring by bayes. Without looking at the headers and the SA rules hit, there's no evidence Bayes did anything at all. As described, this easily could be AWL, too. Oh my, this horse is dead anyway. Thank you for explanation how bayes works and for time which you devoted to me. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: SA-learn (spamassassin)
I got you output of the command sa-learn --dump magic. About your end of your report...it could not be AWL because I have AWL disabled. I had lucky today ... my chief was busy. I will present my solutions tomorrow:) -- View this message in context: http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24793498.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: SA-learn (spamassassin)
On 03.08.09 09:20, monolit wrote: I got you output of the command sa-learn --dump magic. About your end of your report...it could not be AWL because I have AWL disabled. I had lucky today ... my chief was busy. I will present my solutions tomorrow:) Better don't present it until you finally understand how it works... -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Two words: Windows survives. - Craig Mundie, Microsoft senior strategist So does syphillis. Good thing we have penicillin. - Matthew Alton
Re: SA-learn (spamassassin)
On Mon, 2009-08-03 at 09:20 -0700, an anonymous Nabble user wrote: I got you output of the command sa-learn --dump magic. No, you did NOT provide the output. But hey, there's no point in arguing over this or further following up with this thread anyway. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: SA-learn (spamassassin)
If you are so clever (because I am bad english speaker) you can explain me this problematics in my mail(po slovensky). Its problem for you? I didnt enough good materials about this theme in czech language. -- View this message in context: http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24794082.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: SA-learn (spamassassin)
Lieber Karl. I dont know of what command output you need? You said sa-learn --dump magic. What can I think about your requirement? I am total confusing from you ... -- View this message in context: http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24794182.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: SA-learn (spamassassin)
On Mon, 3 Aug 2009 09:43:49 -0700 (PDT), monolit xmull...@gmail.com wrote: If you are so clever (because I am bad english speaker) you can explain me this problematics in my mail(po slovensky). Its problem for you? I didnt enough good materials about this theme in czech language. its hard to help if not both understand what to do, its not your bad danish that are the problem either -- Benny Pedersen
Re: SA-learn (spamassassin)
On Mon, 2009-08-03 at 09:49 -0700, an annoying Nabble user wrote: Lieber Karl. I dont know of what command output you need? You said sa-learn --dump magic. What can I think about your requirement? I am total confusing from you ... Karl? Dear anonymous Nabble user, what I requested from you is the output of that command. The actual output you get when running the command. Not a statement by you, that you have run the command -- but the output, copied and pasted. You do know how to copy-n-paste, don't you? Oh, yeah, you do. We established that before. After all, this entire thread started as a copy-n-paste from an Ubuntu forum. Anyway, end-of-thread for me. Don't bother sending the output now. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: SA-learn (spamassassin)
I am really sorry I am tired from work. I made mistake with your name. This task is serious please dont joke (Oh, yeah, you do. We established that before. After all, this entire thread started as a copy-n-paste from an Ubuntu forum. ) I established the thread hier and then copy it to the Ubuntu forum. How I told I need necessary help. ...and I am not anonymous I have nick... The command didnt run! [r...@localhost 3.002005]# sa-learn --dump magic //start Unrecognized escape \g passed through in regex; marked by -- HERE in m/(?i)\g -- HERE irls\b/ at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/Conf/Parser.pm line 1173. 0.000 0 3 0 non-token data: bayes db version 0.000 0 67 0 non-token data: nspam 0.000 0 29 0 non-token data: nham 0.000 0 1588 0 non-token data: ntokens 0.000 0 1247338497 0 non-token data: oldest atime 0.000 0 1249317365 0 non-token data: newest atime 0.000 0 1249317143 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count [r...@localhost 3.002005]# //stop This is output...the command arent running. But the almost same output I given to the forum. The single difference was that my first post had not prompt. -- View this message in context: http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24795226.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: SA-learn (spamassassin)
I read spamassassin docs... I found out the following: Sa-learn --spam Learn the input message(s) as spam. If you have previously learnt any of the messages as ham, SpamAssassin will forget them first, then re-learn them as spam. Alternatively, if you have previously learnt them as spam, it'll skip them this time around. If the messages have already been filtered through SpamAssassin, the learner will ignore any modifications SpamAssassin may have made. ...and the following bayes_min_ham_num (Default: 200) bayes_min_spam_num (Default: 200) To be accurate, the Bayes system does not activate until a certain number of ham (non-spam) and spam have been learned. The default is 200 of each ham and spam, but you can tune these up or down with these two settings. I changed the value on 1(I use this for testing and my self-learning its my homework). According to me - spam bayes learning was activated. When I use sa-learning so bayes learn that the mail is spam. And bayes learn the signatures... Therefore is for me strange when I send the same mail again so bayes dont mark this mail like spam? I dont understand this. I realize all conditions - sa-learn --spam --file mail. bayes_min_spam_num 1. The date the databaze was too changed(but the size stay the same). nspam was increased... I really dont understand what use is SA-LEARN! I have feel that the bayes dont work correctly- bayes ignore sa-learn. I am perhaps silly but I dont understand how it works:(( I am interesred how tell to bayes THIS MAIL IS SPAM(by using sa-learn), WHEN THIS SAME MAIL COME AGAIN SO YOU HAVE TO MARK LIKE SPAM! I know that bayes find similar element between mail and according to decide. But when I mark mail like spam a next mail have 100% similarity so bayes HAVE TO mark it like SPAM. It is logical acording to me. -- View this message in context: http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24777034.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: SA-learn (spamassassin)
On Sun, 2 Aug 2009 04:36:34 -0700 (PDT) monolit xmull...@gmail.com wrote: I changed the value on 1(I use this for testing and my self-learning its my homework). According to me - spam bayes learning was activated. When I use sa-learning so bayes learn that the mail is spam. And bayes learn the signatures... Therefore is for me strange when I send the same mail again so bayes dont mark this mail like spam? I dont understand this. What you said before was that you corrected autolearn=ham to spam with sa-learn, another similar spam then also had autolearn=ham. Autolearning is not based on the bayes result it's based on other Spamassassin rules . However it wont autolearn in the opposite direction to a strong bayes result, which is why it's a good idea to manually train first. My guess is that you've fed it your one spam, but haven't fed it enough ham to satisfy bayes_min_ham_num, so there is no bayes result and nothing to stop autolearning in the wrong direction. It really is pretty useless to speculate about what it's doing when you are misusing it like this. If you just want to play with it, then feed it 10 hams and 10 spams and set the limits to 10. It wont be very accurate, but it should behave sensibly.
Re: SA-learn (spamassassin)
On Sun, 2009-08-02 at 02:00 +0100, RW wrote: On Sun, 02 Aug 2009 01:42:21 +0200 Karsten Bräckelmann wrote: when I learn bayes by hand (sa-learn --spam --file mail) that this mail is spam? I have explicit set in local.cf bayes_min_spam_num 1. This means that for bayes is sufficient one mail for learning(according to me). But it dosesnt work. Do NOT do that. Unless you *really* understand the implications. Which you don't. It's a default for a reason. It's a counter-measure against bad learning, to force at least some MINIMAL manual training, before auto-learning kicks in. You just side- stepped that. AFAIK it doesn't affect autoleaning at all, bayes_min_spam_num bayes_min_ham_num control when scoring starts. Well, it *does* nonetheless. *shrug* As per the docs, that threshold controls when Bayes activates. Nothing more, nothing less. Want to see for yourself? $ echo | spamassassin --cf='score EMPTY_MESSAGE 6' --cf='score MISSING_DATE 6' X-Spam-Status: Yes, score=17.3 required=8.0 tests=EMPTY_MESSAGE,MISSING_DATE, MISSING_HEADERS,MISSING_MID,MISSING_SUBJECT,NO_HEADERS_MESSAGE,NO_RECEIVED, NO_RELAYS,TVD_SPACE_RATIO autolearn=spam version=3.2.5 $ sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 2 0 non-token data: nspam 0.000 0 1 0 non-token data: nham 0.000 0 20 0 non-token data: ntokens -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: SA-learn (spamassassin)
On Sun, 02 Aug 2009 17:15:52 +0200 Karsten Bräckelmann guent...@rudersport.de wrote: On Sun, 2009-08-02 at 02:00 +0100, RW wrote: On Sun, 02 Aug 2009 01:42:21 +0200 Karsten Bräckelmann wrote: It's a counter-measure against bad learning, to force at least some MINIMAL manual training, before auto-learning kicks in. AFAIK it doesn't affect autoleaning at all, bayes_min_spam_num bayes_min_ham_num control when scoring starts. Well, it *does* nonetheless. *shrug* As per the docs, that threshold controls when Bayes activates. Nothing more, nothing less. Want to see for yourself? .. X-Spam-Status: Yes, score=17.3 required=8.0 tests=EMPTY_MESSAGE,MISSING_DATE, MISSING_HEADERS,MISSING_MID,MISSING_SUBJECT,NO_HEADERS_MESSAGE,NO_RECEIVED, NO_RELAYS,TVD_SPACE_RATIO autolearn=spam version=3.2.5 If you read back you'll see that that's consistent with what I wrote and the opposite of what you wrote. I said that the limits don't effect autolearning, just scoring (activation). Whatever you think you wrote, what you actually wrote was: to force at least some MINIMAL manual training, before auto-learning kicks in There's no ambiguity there, the use of the word force implies that manual training is a prerequisite to auto-learning.
Re: SA-learn (spamassassin)
On Sun, 2009-08-02 at 18:31 +0100, RW wrote: AFAIK it doesn't affect autoleaning at all, bayes_min_spam_num bayes_min_ham_num control when scoring starts. Well, it *does* nonetheless. *shrug* If you read back you'll see that that's consistent with what I wrote and the opposite of what you wrote. Nah, I did set the thresholds to 1. :) I said that the limits don't effect autolearning, just scoring (activation). Damn. My test-case was non-conclusive, I failed to crosscheck. :/ You are correct, auto-learning is not affected by these thresholds. SA does bootstrap Bayes training, even if nspam/nham still is below the limits. Sorry, my bad. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: SA-learn (spamassassin)
FROM SA WWW bayes_min_ham_num (Default: 200) bayes_min_spam_num (Default: 200) To be accurate, the Bayes system does not activate until a certain number of ham (non-spam) and spam have been learned. The default is 200 of each ham and spam, but you can tune these up or down with these two settings. I have theory ...I know you will think thats bad but I tried explain how I understand SA documentation. When I set the bayes_min_spam_num 1 so it means that Bayes learn system will be activate. And now for example: I got mail. I use sa-learn --spam --file mail. SA save the mail(or some signature to the database). And when I got the same mail again so Bayes looks to the database a he says: a the same mail like in my database which is marked like spam, and he mark the mail like spam. According to me is it logical. What is strange when I use SA-LEARN so database dont expand the size, but the time of modification is the same when I sa-learn started. -- View this message in context: http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24780842.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: SA-learn (spamassassin)
On Sun, 2 Aug 2009 11:53:53 -0700 (PDT), monolit xmull...@gmail.com wrote: What is strange when I use SA-LEARN so database dont expand the size, but the time of modification is the same when I sa-learn started. question is ?
Re: SA-learn (spamassassin)
Question is logical. When SA learnt new spam/ham so SA have to write new info to the database and I think that database have to increase size. If you have for example *.doc file and you modify it. You add several words - *.doc will be bigger(increase his size). -- View this message in context: http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24781719.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: SA-learn (spamassassin)
On Sun, 2009-08-02 at 11:53 -0700, an anonymous Nabble user wrote: I have theory ...I know you will think thats bad but I tried explain how I understand SA documentation. When I set the bayes_min_spam_num 1 so it means that Bayes learn system will be activate. And now for example: I got As I just settled with RW in this very thread, the number of spam excee- ding the bayes_min_spam_num value does not activate Bayes *learn*ing. It means that Bayes will classify mail -- based on what it learned before. Learning, whether manual or automatic, always is available if use_bayes and bayes_auto_learn are enabled. The bayes_min_(ham|spam)_num values ONLY control, how many messages Bayes needs to have learned, before it should start classifying mail. And again, 1 is not a sane number. mail. I use sa-learn --spam --file mail. SA save the mail(or some signature to the database). And when I got the same mail again so Bayes looks to the database a he says: a the same mail like in my database which is marked like spam, and he mark the mail like spam. According to me is it logical. No. *sigh* I did explain this earlier today. This is NOT how Bayes works. Bayes does NOT keep signatures of entire messages. Instead, it keeps track of *tokens*, and the number they have been seen in ham or spam. Think of tokens as words. Please do read up on Bayes. And please stop re-iterating this false assumption. Given you repeating some signature of a message, and your other thread regarding Razor (which does actually calculate some signatures for a message) -- I have a feeling you are confusing Bayes with Razor. They are entirely unrelated and do not use the same mechanisms. What is strange when I use SA-LEARN so database dont expand the size, but the time of modification is the same when I sa-learn started. It is a database. It is not a flat text file. There is nothing strange about updating values in a database, and not seeing it inflate proportional to your input data. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: SA-learn (spamassassin)
On Sun, 2 Aug 2009 13:20:41 -0700 (PDT), monolit xmull...@gmail.com wrote: Question is logical. so are google :) When SA learnt new spam/ham so SA have to write new info to the database and I think that database have to increase size. no, my bayes db is around 150M, but all my mail is in webmail at 800M so where is the rest in bayes ? :) If you have for example *.doc file and you modify it. You add several words - *.doc will be bigger(increase his size). if you use bayes on mysql and dump the data, then you see that it not just add new words, it also count how much this word is seen in spam vs ham, and all this words is not just words as we write them here, is encoded to signatures that dont use that much room in the db one example is you can try md5 sum your email address, it will be same length everytime no matter how many chars you email have -- Benny Pedersen
Re: SA-learn (spamassassin)
To Benny Pedersen: I understand your explanation about increasing of spamassassin database. Your example with md5 is clearly. Ok thank you very much! To by Karsten Bräckelmann-2: I want to apologize for my approach - I use Ubuntu and other forums because I am hopeless because my homework was install configure and run antispam(spamassassin, ClamAV, Clamsmtp,razor, postfix). Now I am under pressure because tomorrow I have to deliver my solution to my chief... I must explain to him how it works and so on. the number of spam exceeding the bayes_min_spam_num value does not activate Bayes *learn*ing. It means that Bayes will classify mail -- based on what it learned before. it keeps track of *tokens*, and the number they have been seen in ham or spam. Your explanation is confusing for me, because you claim value of min_spam_num means that Bayes will classify mail -- based on what it learned before My min_spam_num value is 1. I get the first mail. Subject: viagra; body: viagra. I use sa - learn -spam for this mail. I get new mail: Subject: viagra; body: viagra. What will do Bayes according to you? Keep in mind your words The bayes_min_(ham|spam)_num values ONLY control, how many messages Bayes needs to have learned, before it should start classifying mail. = my Bayes can classifying mail(because min_spam_num value is 1 = the condition is accomplish). What now? Will be my new mail mark like spam? Or will get any higher score...? And again, 1 is not a sane number. - I endeavour to explain to you that this is only homework. Why number 1? Because I want to see on my own eyes how bayes works. I dont have time find many really spam(I know the number must be bigger about 1000 - its OK I knew it). -- View this message in context: http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24782439.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: SA-learn (spamassassin)
On Sun, 2009-08-02 at 14:43 -0700, an anonymous Nabble user wrote: To by Karsten Bräckelmann-2: I want to apologize for my approach - I use Ubuntu and other forums because I am hopeless because my homework was install configure and run antispam(spamassassin, ClamAV, Clamsmtp,razor, postfix). Now I am under pressure because tomorrow I have to deliver my solution to my chief... I must explain to him how it works and so on. Good luck with that. Utterly fucked-up quoting, err, dumping of previous posts intermixed with comments, fixicated. the number of spam exceeding the bayes_min_spam_num value does not activate Bayes *learn*ing. It means that Bayes will classify mail -- based on what it learned before. it keeps track of *tokens*, and the number they have been seen in ham or spam. Your explanation is confusing for me, because you claim value of min_spam_num means that Bayes will classify mail -- based on what it learned before My min_spam_num value is 1. I get the first mail. Subject: viagra; body: viagra. I use sa - learn -spam for this mail. I get new mail: Subject: viagra; body: viagra. What will do Bayes according to you? Keep in mind your words Bayes will check the tokens against its database. Based on the number of occurrences of each token in ham and spam, Bayes will return whether the mail appears spammy or hammy (based on what it learned before), and its confidence of that assessment. This classification (ham or spam) and confidence will be scored by SA. Keep in mind there are a LOT more tokens in a message than merely the words in the Subject and Body. This DOES have a severe impact on your results, if your test spam is a self-generated message with the word Viagra as Subject and Body. Nope, this is not a proper test environment. The bayes_min_(ham|spam)_num values ONLY control, how many messages Bayes needs to have learned, before it should start classifying mail. = my Bayes can classifying mail(because min_spam_num value is 1 = the condition is accomplish). What now? Will be my new mail mark like spam? Or will get any higher score...? It will be classified (by Bayes) based on the tokens in the message and the previously learned statistics. Bayes does NOT only mark spam. It also can report a message to look like ham. Anyway, I asked you before to provide sa-learn --dump magic output. You didn't. Given the intro, I seriously wonder if the user you are training Bayes and scanning mail is the same anyway. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: SA-learn (spamassassin)
monolit wrote: Question is logical. When SA learnt new spam/ham so SA have to write new info to the database and I think that database have to increase size. If you have for example *.doc file and you modify it. You add several words - *.doc will be bigger(increase his size). The database doesn't need to grow in size. A berkley db file can contain free space. This is done to avoid constantly shrinking and growing the file on disk. Deleted elements are merely marked as free space for later use. Therefore, data can be added to a berkley db file, without an increase in file size.
SA-learn (spamassassin)
Hello, I found out the following information: my SPAMD daemon is running under root. But I have in master.cf(postfix configuration file) the following lines: Postfix master process configuration file. For details on the format # of the file, see the master(5) manual page (command: man 5 master). # # == # service type private unpriv chroot wakeup maxproc command + args # (yes) (yes) (yes) (never) (100) # == smtp inet n - n - - smtpd -o content_filter=spamfilter:dummy == == # Interfaces to non-Postfix software. Be sure to examine the manual # pages of the non-Postfix software to find out what options it wants. # # Many of the following services use the Postfix pipe( delivery # agent. See the pipe( man page for information about ${recipient} # and other message envelope options. # == == spamfilter unix - n n - - pipe flags=Rq user=spamfilter argv=/usr/local/bin/spamfilter -f ${sender} -- ${recipient} Spamfilter is user for spamassassin(spamd)(but for me is strange that spamd is running under root). I configured master.cf according to h-t-t-p://onetforum.com/fourm/viewtopic.php?p=27]Kalinga's]Kalinga's Community Support Forum bull; View topic - Integrating Spam Assassin with Postfix(h-t-t-p replace by http) It is recomended by spamassassin original www pages. In local.cf I have: bayes_path /home/spamfilter/.spamassassin/bayes. And now when I send mail(for example at 21:00 oclock) which spamassassin mark like autolearn= spam and I show to the /home/spamfilter/.spamassassin/bayes so I can see that files bayes_tooks nad bayes_seen was modified in 21:00 but their size didnt change? How is it possible - when spamssassin changes the files so they have to increase their size...When I type command sa-learn --dump magic so I can see that in row nspam increase his value +1. This is confirmation that autolearn works.(but the database dont increase his size). My second problem: I get mail with sign autolearn=ham. I take the mail and I use the following command: sa-learn --spam --file mail (at 21:55 oclock)l. When type sa-learn --dump magic so I can see that nspam was increased +1 its OK. But when I look to the /home/spamfilter/.spamassassin I can see that database file was change but their size didnt change. Its normal??? And the last problem: When I get mail with sign autolearn=ham so I tried type sa-learn --spam --file mail. When I got the same mail so spamassassin mark the mail again autolearn=ham. How is it possible when I learn bayes by hand (sa-learn --spam --file mail) that this mail is spam? I have explicit set in local.cf bayes_min_spam_num 1. This means that for bayes is sufficient one mail for learning(according to me). But it dosesnt work. Thanks for advise(I need it necessary). Sorry for my terrible english. -- View this message in context: http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24773517.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: SA-learn (spamassassin)
On Sat, 2009-08-01 at 16:13 -0700, an anonymous Nabble user wrote: And the last problem: When I get mail with sign autolearn=ham so I tried type sa-learn --spam --file mail. When I got the same mail so spamassassin mark the mail again autolearn=ham. How is it possible when I learn bayes by hand (sa-learn --spam --file mail) that this mail is spam? I have explicit set in local.cf bayes_min_spam_num 1. This means that for bayes is sufficient one mail for learning(according to me). But it dosesnt work. Do NOT do that. Unless you *really* understand the implications. Which you don't. It's a default for a reason. It's a counter-measure against bad learning, to force at least some MINIMAL manual training, before auto-learning kicks in. You just side- stepped that. You should read some docs on Bayes, before messing with its settings. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: SA-learn (spamassassin)
On Sun, 02 Aug 2009 01:42:21 +0200 Karsten Bräckelmann guent...@rudersport.de wrote: On Sat, 2009-08-01 at 16:13 -0700, an anonymous Nabble user wrote: And the last problem: When I get mail with sign autolearn=ham so I tried type sa-learn --spam --file mail. When I got the same mail so spamassassin mark the mail again autolearn=ham.How is it possible It's not the same spam, it'll have different headers. when I learn bayes by hand (sa-learn --spam --file mail) that this mail is spam? I have explicit set in local.cf bayes_min_spam_num 1. This means that for bayes is sufficient one mail for learning(according to me). But it dosesnt work. It's not like pyzor where you set a threshold, it's a statistical filter, you have to feed it hundreds of mails before it produces reliable results, hence the 200 spam minimum. Do NOT do that. Unless you *really* understand the implications. Which you don't. It's a default for a reason. It's a counter-measure against bad learning, to force at least some MINIMAL manual training, before auto-learning kicks in. You just side- stepped that. AFAIK it doesn't affect autoleaning at all, bayes_min_spam_num bayes_min_ham_num control when scoring starts.