Re: SA-learn (spamassassin)

2009-08-04 Thread Matus UHLAR - fantomas
On 03.08.09 09:43, monolit wrote:
 If you are so clever (because I am bad english speaker) you can explain me
 this problematics in my mail(po slovensky). Its problem for you? I didnt
 enough good materials about this theme in czech language.

Well,
- do not train on fake messages.
- do not modify bayes_min_*_num
- train on much spam and on much ham.

you can apparently keep the trained corpus somewhere for later revision (if
you incorrectly trained anything) or Bayes DB refill if you loose it.

I don't know if there's anything to translate.

you should understand that in any mail there are many tokens you don't
even notice. 
-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
How does cat play with mouse? cat /dev/mouse


Re: SA-learn (spamassassin)

2009-08-04 Thread Benny Pedersen
On Tue, 4 Aug 2009 14:39:44 +0200, Matus UHLAR - fantomas
uh...@fantomas.sk wrote:
 On 03.08.09 09:43, monolit wrote:
 If you are so clever (because I am bad english speaker) you can explain
 me
 this problematics in my mail(po slovensky). Its problem for you? I
didnt
 enough good materials about this theme in czech language.
 
 Well,
 - do not train on fake messages.

why not ?

 - do not modify bayes_min_*_num

why not ?

 - train on much spam and on much ham.

bayes just need to know as much samples on what is spam and what is not
spam, keep nham, nspam nearly equal is sign of good training imho

 you can apparently keep the trained corpus somewhere for later revision
(if
 you incorrectly trained anything) or Bayes DB refill if you loose it.

well it might be good advise but spam changes every day, also why spammers
have big succes pass in spamassassin

 I don't know if there's anything to translate.

maybe not

 you should understand that in any mail there are many tokens you don't
 even notice.

thats the point with bayes

-- 
Benny Pedersen


Re: SA-learn (spamassassin)

2009-08-03 Thread monolit

Good morning. The output of sa-learn --dump magic after bayes learning is +1
nspam/nham. I tried the command several. times. I tried write the mail with
Subject: viagra; body: viagra and sent it from my first account to the my
second account(score 0,4). Then I used sa-learn -spam for this mail. I wrote
the same mail and sent it from account one to the second. The mail gain
higher score 2.4. I took this mail and used sa-learn -spam. I wrote the same
mail and repeat  the sending(From 1. account to the second). The score was
again higher 3.4. I tried it still several times but the score didnt grow...
Thats was my small experiment with scoring by bayes.

My spamd process run under root. I started  sa-learn under root. BUT the
database is in /root directory and the same database is in /home/spamfilter
directory. Spamfilter is user which is  state in master.cf. In spamassassin
(local.cf) I have record for the bayes database and the path is
/home/spamfilter... When I started sa-learn under root so I check time of
updating database. The database under user spamfilter is correctly
updated(under root isnt updated).

I know it is strange and confusing ...use two user for this. I wish all
function and so on ran under one user, but I dont know how start up spamd
under spamfilter. I am not sure if is it the right... maybe spamd should
running under root.
Here is my modification from master.cf(postfix). This modification is
recommended by spamassassin www pages.

smtp  inet  n   -   n   -   -   smtpd
 -o content_filter=spamfilter:dummy


# Interfaces to non-Postfix software. Be sure to examine the manual
# pages of the non-Postfix software to find out what options it wants.
# 
spamfilter unix -   n   n   -   -   pipe
 flags=Rq user=spamfilter argv=/usr/local/bin/spamfilter -f ${sender} --
${recipient}

Thank you for explanation how bayes works and for time which you devoted to
me.
-- 
View this message in context: 
http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24786173.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: SA-learn (spamassassin)

2009-08-03 Thread Karsten Bräckelmann
On Sun, 2009-08-02 at 23:50 -0700, monolit wrote:
 Good morning. The output of sa-learn --dump magic after bayes learning is +1
 nspam/nham. I tried the command several. times. I tried write the mail with

I did not ask for the difference. I asked for the output of the command.

 Subject: viagra; body: viagra and sent it from my first account to the my
 second account(score 0,4). Then I used sa-learn -spam for this mail. I wrote

As I told you before, there are *lots* of other tokens. Which differ
greatly between your self-written messages and spam. Measuring Bayes by
observing a single token is broken.

 the same mail and sent it from account one to the second. The mail gain
 higher score 2.4. I took this mail and used sa-learn -spam. I wrote the same
 mail and repeat  the sending(From 1. account to the second). The score was
 again higher 3.4. I tried it still several times but the score didnt grow...
 Thats was my small experiment with scoring by bayes.

Without looking at the headers and the SA rules hit, there's no evidence
Bayes did anything at all. As described, this easily could be AWL, too.

Oh my, this horse is dead anyway.


 Thank you for explanation how bayes works and for time which you devoted to
 me.

-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SA-learn (spamassassin)

2009-08-03 Thread monolit

I got you output of the command sa-learn --dump magic. About your end of your
report...it could not be AWL because I have AWL disabled. I had lucky today
... my chief was busy. I will present my solutions tomorrow:)
-- 
View this message in context: 
http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24793498.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: SA-learn (spamassassin)

2009-08-03 Thread Matus UHLAR - fantomas
On 03.08.09 09:20, monolit wrote:
 I got you output of the command sa-learn --dump magic. About your end of
 your report...it could not be AWL because I have AWL disabled. I had lucky
 today ... my chief was busy. I will present my solutions tomorrow:)

Better don't present it until you finally understand how it works...

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Two words: Windows survives. - Craig Mundie, Microsoft senior strategist
So does syphillis. Good thing we have penicillin. - Matthew Alton


Re: SA-learn (spamassassin)

2009-08-03 Thread Karsten Bräckelmann
On Mon, 2009-08-03 at 09:20 -0700, an anonymous Nabble user wrote:
 I got you output of the command sa-learn --dump magic.

No, you did NOT provide the output.  But hey, there's no point in
arguing over this or further following up with this thread anyway.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SA-learn (spamassassin)

2009-08-03 Thread monolit

If you are so clever (because I am bad english speaker) you can explain me
this problematics in my mail(po slovensky). Its problem for you? I didnt
enough good materials about this theme in czech language.
-- 
View this message in context: 
http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24794082.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: SA-learn (spamassassin)

2009-08-03 Thread monolit

Lieber Karl. I dont know of what command output you need? You said sa-learn
--dump magic. What can I think about your requirement? I am total confusing
from you ...  
-- 
View this message in context: 
http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24794182.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: SA-learn (spamassassin)

2009-08-03 Thread Benny Pedersen
On Mon, 3 Aug 2009 09:43:49 -0700 (PDT), monolit xmull...@gmail.com
wrote:
 If you are so clever (because I am bad english speaker) you can explain
me
 this problematics in my mail(po slovensky). Its problem for you? I didnt
 enough good materials about this theme in czech language.

its hard to help if not both understand what to do, its not your bad
danish that are the problem either

-- 
Benny Pedersen


Re: SA-learn (spamassassin)

2009-08-03 Thread Karsten Bräckelmann
On Mon, 2009-08-03 at 09:49 -0700, an annoying Nabble user wrote:
 Lieber Karl. I dont know of what command output you need? You said sa-learn
 --dump magic. What can I think about your requirement? I am total confusing
 from you ...  

Karl?


Dear anonymous Nabble user,

what I requested from you is the output of that command. The actual
output you get when running the command. Not a statement by you, that
you have run the command -- but the output, copied and pasted. You do
know how to copy-n-paste, don't you?

Oh, yeah, you do. We established that before. After all, this entire
thread started as a copy-n-paste from an Ubuntu forum.


Anyway, end-of-thread for me.  Don't bother sending the output now.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SA-learn (spamassassin)

2009-08-03 Thread monolit

I am really sorry I am tired from work. I made mistake with your name. This
task is serious please dont joke (Oh, yeah, you do. We established that
before. After all, this entire thread started as a copy-n-paste from an
Ubuntu forum. ) I established the thread hier and then copy it to the Ubuntu
forum. How I told I need necessary help. ...and I am not anonymous I have
nick...
The command didnt run!

 [r...@localhost 3.002005]# sa-learn --dump magic  //start
Unrecognized escape \g passed through in regex; marked by -- HERE in 
m/(?i)\g -- HERE irls\b/ at 
/usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/Conf/Parser.pm line 
1173.
0.000  0  3  0  non-token data: bayes db version
0.000  0 67  0  non-token data: nspam
0.000  0 29  0  non-token data: nham
0.000  0   1588  0  non-token data: ntokens
0.000  0 1247338497  0  non-token data: oldest atime
0.000  0 1249317365  0  non-token data: newest atime
0.000  0 1249317143  0  non-token data: last journal 
sync atime
0.000  0  0  0  non-token data: last expiry 
atime
0.000  0  0  0  non-token data: last expire 
atime delta
0.000  0  0  0  non-token data: last expire 
reduction count
[r...@localhost 3.002005]#  //stop

This is output...the command arent running. But the almost same output I
given to the forum. The single difference was that my first post had not
prompt.

-- 
View this message in context: 
http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24795226.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: SA-learn (spamassassin)

2009-08-02 Thread monolit

I read spamassassin docs... I found out the following:
Sa-learn
--spam
Learn the input message(s) as spam. If you have previously learnt any of
the messages as ham, SpamAssassin will forget them first, then re-learn them
as spam. Alternatively, if you have previously learnt them as spam, it'll
skip them this time around. If the messages have already been filtered
through SpamAssassin, the learner will ignore any modifications SpamAssassin
may have made. 

...and the following 

bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
To be accurate, the Bayes system does not activate until a certain
number of ham (non-spam) and spam have been learned. The default is 200 of
each ham and spam, but you can tune these up or down with these two
settings. 

I changed the value on 1(I use this for testing and my self-learning its
my homework). According to me - spam bayes learning was activated. When I
use sa-learning so bayes learn that the mail is spam. And bayes learn the
signatures...

Therefore is for me strange when I send the same mail again so bayes dont
mark this mail like spam? I dont understand this. I realize all conditions -
sa-learn  --spam --file  mail. bayes_min_spam_num 1. The date the databaze
was too changed(but the size stay the same). nspam was increased... I really
dont understand what use is SA-LEARN! I have feel that the bayes dont work
correctly- bayes ignore sa-learn. I am perhaps silly but I dont understand
how it works:(( I am interesred how tell to bayes THIS MAIL IS SPAM(by using
sa-learn), WHEN THIS SAME MAIL COME AGAIN SO YOU HAVE TO MARK LIKE SPAM! I
know that bayes find similar element between mail and according to decide.
But when I mark mail like spam a next mail have 100% similarity so bayes
HAVE TO mark it like SPAM. It is logical acording to me.

-- 
View this message in context: 
http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24777034.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: SA-learn (spamassassin)

2009-08-02 Thread RW
On Sun, 2 Aug 2009 04:36:34 -0700 (PDT)
monolit xmull...@gmail.com wrote:

 

 I changed the value on 1(I use this for testing and my
 self-learning its my homework). According to me - spam bayes learning
 was activated. When I use sa-learning so bayes learn that the mail is
 spam. And bayes learn the signatures...
 
 Therefore is for me strange when I send the same mail again so bayes
 dont mark this mail like spam? I dont understand this.

What you said before was that you corrected autolearn=ham to spam with
sa-learn, another similar spam then also had autolearn=ham.

Autolearning is not based on the bayes result it's based on other
Spamassassin rules . However it wont autolearn in the opposite
direction to a strong bayes result, which is why it's a good idea to
manually train first.

My guess is that you've fed it your one spam, but haven't
fed it enough ham to satisfy  bayes_min_ham_num, so there is no bayes
result and nothing to stop autolearning in the wrong direction.

It really is pretty useless to speculate about what it's doing when
you are misusing it like this. If you just want to play with it, then
feed it 10 hams and 10 spams and set the limits to 10. It wont be very
accurate, but it should behave sensibly. 


Re: SA-learn (spamassassin)

2009-08-02 Thread Karsten Bräckelmann
On Sun, 2009-08-02 at 02:00 +0100, RW wrote:
 On Sun, 02 Aug 2009 01:42:21 +0200 Karsten Bräckelmann wrote:

   when I learn bayes by hand (sa-learn --spam --file mail) that this
   mail is spam? I have explicit set in local.cf bayes_min_spam_num 1.
   This means that for bayes is sufficient one mail for
   learning(according to me). But it dosesnt work.

  Do NOT do that.
  
  Unless you *really* understand the implications. Which you don't.
  It's a default for a reason.
  
  It's a counter-measure against bad learning, to force at least some
  MINIMAL manual training, before auto-learning kicks in. You just side-
  stepped that.
 
 AFAIK it doesn't affect autoleaning at all, bayes_min_spam_num 
 bayes_min_ham_num control when scoring starts.

Well, it *does* nonetheless. *shrug*

As per the docs, that threshold controls when Bayes activates. Nothing
more, nothing less. Want to see for yourself?


$ echo | spamassassin --cf='score EMPTY_MESSAGE 6' --cf='score MISSING_DATE 6'

X-Spam-Status: Yes, score=17.3 required=8.0 tests=EMPTY_MESSAGE,MISSING_DATE,
  MISSING_HEADERS,MISSING_MID,MISSING_SUBJECT,NO_HEADERS_MESSAGE,NO_RECEIVED,
  NO_RELAYS,TVD_SPACE_RATIO autolearn=spam version=3.2.5

$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  2  0  non-token data: nspam
0.000  0  1  0  non-token data: nham
0.000  0 20  0  non-token data: ntokens


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SA-learn (spamassassin)

2009-08-02 Thread RW
On Sun, 02 Aug 2009 17:15:52 +0200
Karsten Bräckelmann guent...@rudersport.de wrote:

 On Sun, 2009-08-02 at 02:00 +0100, RW wrote:
  On Sun, 02 Aug 2009 01:42:21 +0200 Karsten Bräckelmann wrote:

   It's a counter-measure against bad learning, to force at least
   some MINIMAL manual training, before auto-learning kicks in. 

  AFAIK it doesn't affect autoleaning at all, bayes_min_spam_num 
  bayes_min_ham_num control when scoring starts.
 
 Well, it *does* nonetheless. *shrug*
 
 As per the docs, that threshold controls when Bayes activates. Nothing
 more, nothing less. Want to see for yourself?
..
 X-Spam-Status: Yes, score=17.3 required=8.0
 tests=EMPTY_MESSAGE,MISSING_DATE,
 MISSING_HEADERS,MISSING_MID,MISSING_SUBJECT,NO_HEADERS_MESSAGE,NO_RECEIVED,
 NO_RELAYS,TVD_SPACE_RATIO autolearn=spam version=3.2.5
 

If you read back you'll see that that's consistent with what I wrote and
the opposite of what you wrote.

I said that the limits don't effect autolearning, just scoring
(activation).

Whatever you think you wrote, what you actually wrote was:

  to force at least some MINIMAL manual training, before
   auto-learning kicks in

There's no ambiguity there, the use of the word force implies that
manual training is a prerequisite to auto-learning.



Re: SA-learn (spamassassin)

2009-08-02 Thread Karsten Bräckelmann
On Sun, 2009-08-02 at 18:31 +0100, RW wrote:
   AFAIK it doesn't affect autoleaning at all, bayes_min_spam_num 
   bayes_min_ham_num control when scoring starts.
  
  Well, it *does* nonetheless. *shrug*

 If you read back you'll see that that's consistent with what I wrote and
 the opposite of what you wrote.

Nah, I did set the thresholds to 1. :)

 I said that the limits don't effect autolearning, just scoring
 (activation).

Damn. My test-case was non-conclusive, I failed to crosscheck. :/

You are correct, auto-learning is not affected by these thresholds. SA
does bootstrap Bayes training, even if nspam/nham still is below the
limits. Sorry, my bad.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SA-learn (spamassassin)

2009-08-02 Thread monolit

FROM SA WWW
bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
To be accurate, the Bayes system does not activate until a certain
number of ham (non-spam) and spam have been learned. The default is 200 of
each ham and spam, but you can tune these up or down with these two
settings. 

I have theory ...I know you will think thats bad but I tried explain how I
understand SA documentation. When I set the bayes_min_spam_num 1 so it
means that Bayes learn system will be activate. And now for example: I got
mail. I use sa-learn --spam --file mail. SA save the mail(or some signature
to the database). And when I got the same mail again so Bayes looks to the
database a he says: a the same mail like in my database which is marked
like spam, and he mark the mail like spam. According to me is it logical.
What is strange when I use SA-LEARN so database dont expand the size, but
the time of modification is the same when I sa-learn started.
-- 
View this message in context: 
http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24780842.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: SA-learn (spamassassin)

2009-08-02 Thread me

On Sun, 2 Aug 2009 11:53:53 -0700 (PDT), monolit xmull...@gmail.com
wrote:

 What is strange when I use SA-LEARN so database dont expand the size, but
 the time of modification is the same when I sa-learn started.

question is ?




Re: SA-learn (spamassassin)

2009-08-02 Thread monolit

Question is logical. When SA learnt new spam/ham so SA have to write new info
to the database and I think that database have to increase size. If you have
for example *.doc file and you modify it. You add several words - *.doc will
be bigger(increase his size).
-- 
View this message in context: 
http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24781719.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: SA-learn (spamassassin)

2009-08-02 Thread Karsten Bräckelmann
On Sun, 2009-08-02 at 11:53 -0700, an anonymous Nabble user wrote:
 I have theory ...I know you will think thats bad but I tried explain how I
 understand SA documentation. When I set the bayes_min_spam_num 1 so it
 means that Bayes learn system will be activate. And now for example: I got

As I just settled with RW in this very thread, the number of spam excee-
ding the bayes_min_spam_num value does not activate Bayes *learn*ing. It
means that Bayes will classify mail -- based on what it learned before.

Learning, whether manual or automatic, always is available if use_bayes
and bayes_auto_learn are enabled.

The bayes_min_(ham|spam)_num values ONLY control, how many messages
Bayes needs to have learned, before it should start classifying mail.
And again, 1 is not a sane number.


 mail. I use sa-learn --spam --file mail. SA save the mail(or some signature
 to the database). And when I got the same mail again so Bayes looks to the
 database a he says: a the same mail like in my database which is marked
 like spam, and he mark the mail like spam. According to me is it logical.

No. *sigh*  I did explain this earlier today. This is NOT how Bayes
works. Bayes does NOT keep signatures of entire messages. Instead, it
keeps track of *tokens*, and the number they have been seen in ham or
spam. Think of tokens as words.

Please do read up on Bayes. And please stop re-iterating this false
assumption.

Given you repeating some signature of a message, and your other thread
regarding Razor (which does actually calculate some signatures for a
message) -- I have a feeling you are confusing Bayes with Razor. They
are entirely unrelated and do not use the same mechanisms.


 What is strange when I use SA-LEARN so database dont expand the size, but
 the time of modification is the same when I sa-learn started.

It is a database. It is not a flat text file. There is nothing strange
about updating values in a database, and not seeing it inflate
proportional to your input data.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SA-learn (spamassassin)

2009-08-02 Thread Benny Pedersen

On Sun, 2 Aug 2009 13:20:41 -0700 (PDT), monolit xmull...@gmail.com
wrote:
 Question is logical.

so are google :)

 When SA learnt new spam/ham so SA have to write new info
 to the database and I think that database have to increase size.

no, my bayes db is around 150M, but all my mail is in webmail at 800M so
where is the rest in bayes ? :)

 If you have for example *.doc file and you modify it. You add several
words - *.doc
 will be bigger(increase his size).

if you use bayes on mysql and dump the data, then you see that it not just
add new words, it also count how much this word is seen in spam vs ham, and
all this words is not just words as we write them here, is encoded to
signatures that dont use that much room in the db

one example is you can try md5 sum your email address, it will be same
length everytime no matter how many chars you email have

-- 
Benny Pedersen


Re: SA-learn (spamassassin)

2009-08-02 Thread monolit

To Benny Pedersen: I understand your explanation about increasing of
spamassassin database. Your example with md5 is clearly. Ok thank you very
much!

To by Karsten Bräckelmann-2: I want to apologize for my approach - I use
Ubuntu and other forums because I am hopeless because my homework was
install configure and run antispam(spamassassin, ClamAV, Clamsmtp,razor,
postfix). Now I am under pressure because tomorrow I have to deliver my
solution to my chief... I must explain to him how it works and so on. 

the number of spam exceeding the bayes_min_spam_num value does not activate
Bayes *learn*ing. It means that Bayes will classify mail -- based on what it
learned before. it keeps track of *tokens*, and the number they have been
seen in ham or spam. Your  explanation is confusing for me, because you
claim value of min_spam_num  means that Bayes will classify mail -- based on
what it learned before My min_spam_num value is 1. I get the first mail.
Subject: viagra; body: viagra. I use sa - learn -spam for this mail. I get
new mail: Subject: viagra; body: viagra. What will do Bayes according to
you? Keep in mind your words 
The bayes_min_(ham|spam)_num values ONLY control, how many messages
Bayes needs to have learned, before it should start classifying mail.  = my
Bayes can classifying mail(because min_spam_num value is 1 = the condition
is accomplish). What now? Will be my new mail mark like spam? Or will get
any higher score...?


And again, 1 is not a sane number. - I endeavour to explain to you that this
is only homework. Why number 1? Because I want to see on my own eyes how 
bayes works. I dont have time find many really spam(I know the number must
be bigger about 1000 - its OK I knew it).
-- 
View this message in context: 
http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24782439.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: SA-learn (spamassassin)

2009-08-02 Thread Karsten Bräckelmann
On Sun, 2009-08-02 at 14:43 -0700, an anonymous Nabble user wrote:
 To by Karsten Bräckelmann-2: I want to apologize for my approach - I use
 Ubuntu and other forums because I am hopeless because my homework was
 install configure and run antispam(spamassassin, ClamAV, Clamsmtp,razor,
 postfix). Now I am under pressure because tomorrow I have to deliver my
 solution to my chief... I must explain to him how it works and so on. 

Good luck with that.


Utterly fucked-up quoting, err, dumping of previous posts intermixed
with comments, fixicated.

  the number of spam exceeding the bayes_min_spam_num value does not activate
  Bayes *learn*ing. It means that Bayes will classify mail -- based on what it
  learned before.

  it keeps track of *tokens*, and the number they have been seen in ham
  or spam.

 Your  explanation is confusing for me, because you
 claim value of min_spam_num  means that Bayes will classify mail -- based on
 what it learned before My min_spam_num value is 1. I get the first mail.
 Subject: viagra; body: viagra. I use sa - learn -spam for this mail. I get
 new mail: Subject: viagra; body: viagra. What will do Bayes according to
 you? Keep in mind your words 

Bayes will check the tokens against its database. Based on the number of
occurrences of each token in ham and spam, Bayes will return whether the
mail appears spammy or hammy (based on what it learned before), and its
confidence of that assessment.

This classification (ham or spam) and confidence will be scored by SA.

Keep in mind there are a LOT more tokens in a message than merely the
words in the Subject and Body. This DOES have a severe impact on your
results, if your test spam is a self-generated message with the word
Viagra as Subject and Body. Nope, this is not a proper test environment.


  The bayes_min_(ham|spam)_num values ONLY control, how many messages
  Bayes needs to have learned, before it should start classifying mail.

 = my Bayes can classifying mail(because min_spam_num value is 1 = the
 condition is accomplish). What now? Will be my new mail mark like spam?
 Or will get any higher score...?

It will be classified (by Bayes) based on the tokens in the message and
the previously learned statistics. Bayes does NOT only mark spam. It
also can report a message to look like ham.

Anyway, I asked you before to provide sa-learn --dump magic output. You
didn't. Given the intro, I seriously wonder if the user you are training
Bayes and scanning mail is the same anyway.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SA-learn (spamassassin)

2009-08-02 Thread Matt Kettler
monolit wrote:
 Question is logical. When SA learnt new spam/ham so SA have to write new info
 to the database and I think that database have to increase size. If you have
 for example *.doc file and you modify it. You add several words - *.doc will
 be bigger(increase his size).
   
The database doesn't need to grow in size.

A berkley db file can contain free space. This is done to avoid
constantly shrinking and growing the file on disk. Deleted elements are
merely marked as free space for later use.

Therefore, data can be added to a berkley db file, without an increase
in file size.



SA-learn (spamassassin)

2009-08-01 Thread monolit

Hello, I found out the following information:
my SPAMD daemon is running under root. But I have in master.cf(postfix
configuration file) the following lines:

Postfix master process configuration file. For details on the format
# of the file, see the master(5) manual page (command: man 5 master).
#
# ==

# service type private unpriv chroot wakeup maxproc command + args
# (yes) (yes) (yes) (never) (100)
# ==

smtp inet n - n - - smtpd
-o content_filter=spamfilter:dummy


== ==
# Interfaces to non-Postfix software. Be sure to examine the manual
# pages of the non-Postfix software to find out what options it wants.
#
# Many of the following services use the Postfix pipe( delivery
# agent. See the pipe( man page for information about ${recipient}
# and other message envelope options.
# == ==

spamfilter unix - n n - - pipe
flags=Rq user=spamfilter argv=/usr/local/bin/spamfilter -f ${sender} --
${recipient}

Spamfilter is user for spamassassin(spamd)(but for me is strange that spamd
is running under root). I configured master.cf according to
h-t-t-p://onetforum.com/fourm/viewtopic.php?p=27]Kalinga's]Kalinga's
Community Support Forum bull; View topic - Integrating Spam Assassin with
Postfix(h-t-t-p replace by http)
It is recomended by spamassassin original www pages.


In local.cf I have: bayes_path /home/spamfilter/.spamassassin/bayes.

And now when I send mail(for example at 21:00 oclock) which spamassassin
mark like autolearn= spam and I show to the
/home/spamfilter/.spamassassin/bayes so I can see that files bayes_tooks nad
bayes_seen was modified in 21:00 but their size didnt change? How is it
possible - when spamssassin changes the files so they have to increase their
size...When I type command sa-learn --dump magic so I can see that in row
nspam increase his value +1. This is confirmation that autolearn works.(but
the database dont increase his size).

My second problem: I get mail with sign autolearn=ham. I take the mail and I
use the following command: sa-learn --spam --file mail (at 21:55 oclock)l.
When type sa-learn --dump magic so I can see that nspam was increased +1 its
OK. But when I look to the /home/spamfilter/.spamassassin I can see that
database file was change but their size didnt change. Its normal???

And the last problem: When I get mail with sign autolearn=ham so I tried
type sa-learn --spam --file mail. When I got the same mail so spamassassin
mark the mail again autolearn=ham. How is it possible when I learn bayes by
hand (sa-learn --spam --file mail) that this mail is spam? I have explicit
set in local.cf bayes_min_spam_num 1. This means that for bayes is
sufficient one mail for learning(according to me). But it dosesnt work.

Thanks for advise(I need it necessary).




Sorry for my terrible english.
-- 
View this message in context: 
http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24773517.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: SA-learn (spamassassin)

2009-08-01 Thread Karsten Bräckelmann
On Sat, 2009-08-01 at 16:13 -0700, an anonymous Nabble user wrote:
 And the last problem: When I get mail with sign autolearn=ham so I tried
 type sa-learn --spam --file mail. When I got the same mail so spamassassin
 mark the mail again autolearn=ham. How is it possible when I learn bayes by
 hand (sa-learn --spam --file mail) that this mail is spam? I have explicit
 set in local.cf bayes_min_spam_num 1. This means that for bayes is
 sufficient one mail for learning(according to me). But it dosesnt work.

Do NOT do that.

Unless you *really* understand the implications. Which you don't. It's a
default for a reason.

It's a counter-measure against bad learning, to force at least some
MINIMAL manual training, before auto-learning kicks in. You just side-
stepped that.

You should read some docs on Bayes, before messing with its settings.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SA-learn (spamassassin)

2009-08-01 Thread RW
On Sun, 02 Aug 2009 01:42:21 +0200
Karsten Bräckelmann guent...@rudersport.de wrote:

 On Sat, 2009-08-01 at 16:13 -0700, an anonymous Nabble user wrote:
  And the last problem: When I get mail with sign autolearn=ham so I
  tried type sa-learn --spam --file mail. When I got the same mail so
  spamassassin mark the mail again autolearn=ham.How is it possible

It's not the same spam, it'll have different headers.

  when I learn bayes by hand (sa-learn --spam --file mail) that this
  mail is spam? I have explicit set in local.cf bayes_min_spam_num 1.
  This means that for bayes is sufficient one mail for
  learning(according to me). But it dosesnt work.

It's not like pyzor where you set a threshold, it's a statistical
filter, you have to feed it hundreds of mails before it produces
reliable results, hence the 200 spam minimum.

 Do NOT do that.
 
 Unless you *really* understand the implications. Which you don't.
 It's a default for a reason.
 
 It's a counter-measure against bad learning, to force at least some
 MINIMAL manual training, before auto-learning kicks in. You just side-
 stepped that.

AFAIK it doesn't affect autoleaning at all, bayes_min_spam_num 
bayes_min_ham_num control when scoring starts.