Re: Razor, spamassassin - network test

2009-08-02 Thread monolit

I am really sorry it was mistake - I was yesterday very tired.

Back on-list.  I'm not a personal help-line.

When I use spamassassin -t -D razor2  /tmp/spam so I dont get the hash and
so on but content analysis
  details...bayes clasification and so on. I expected message like 

debug: Razor is available
  debug: Razor Agents 1.20, protocol version 2.
  debug: Read server list from /home/jgb/.razor.lst
  debug: 72636 seconds before closest server discovery
  debug: Closest server is 209.204.62.150
  debug: Connecting to 209.204.62.150...
  debug: Connection established
  debug: Signature: 48e74b8496877ba45072b201b41eebed7038186b
  debug: Server version: 1.11, protocol version 2
  debug: Server response: Negative
  48e74b8496877ba45072b201b41eebed7038186b
  debug: Message 1 NOT found in the catalogue

I dont have any idea howto do razor works. This command(spamassassin -t -D
razor2  /tmp/spam) is without --lint and its recommended by spamassassin
www pages.so  I am begginer in this field and therefore I need accurate
advise. 
Thanks for your help


-- 
View this message in context: 
http://www.nabble.com/Razor%2C-spamassassin---network-test-tp24773506p24776602.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: blacklisting a forger

2009-08-02 Thread mouss
Terry Carmen a écrit :
 On Sat, 1 Aug 2009 19:33:40 -0400
 Terry Carmen te...@cnysupport.com wrote:

 The backscatter would not have been received, since the sender is on
 a number of RBLs.
 It's the IP address of the botnet PC that's on the RBLs, the backscatter
 doesn't come from there, it comes from the recipients of the spam.

 See:  http://en.wikipedia.org/wiki/Backscatter_(e-mail)
 
 Regardless of whether or not the message was backscatter, The sending system
 (triband-mum-59.184.51.13.mtnl.net.in) is blacklisted,
 

- bot at triband-* sent junk to silly.server.example.
- silly.server.example didn't reject it. instead it bounced it to OP
- the bounce includes infos about which host sent the original junk to
silly.server.example, and this is triband-*

so for OP, this is backscatter, and RBL/DNSBL is of no help.


Re: SA-learn (spamassassin)

2009-08-02 Thread monolit

I read spamassassin docs... I found out the following:
Sa-learn
--spam
Learn the input message(s) as spam. If you have previously learnt any of
the messages as ham, SpamAssassin will forget them first, then re-learn them
as spam. Alternatively, if you have previously learnt them as spam, it'll
skip them this time around. If the messages have already been filtered
through SpamAssassin, the learner will ignore any modifications SpamAssassin
may have made. 

...and the following 

bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
To be accurate, the Bayes system does not activate until a certain
number of ham (non-spam) and spam have been learned. The default is 200 of
each ham and spam, but you can tune these up or down with these two
settings. 

I changed the value on 1(I use this for testing and my self-learning its
my homework). According to me - spam bayes learning was activated. When I
use sa-learning so bayes learn that the mail is spam. And bayes learn the
signatures...

Therefore is for me strange when I send the same mail again so bayes dont
mark this mail like spam? I dont understand this. I realize all conditions -
sa-learn  --spam --file  mail. bayes_min_spam_num 1. The date the databaze
was too changed(but the size stay the same). nspam was increased... I really
dont understand what use is SA-LEARN! I have feel that the bayes dont work
correctly- bayes ignore sa-learn. I am perhaps silly but I dont understand
how it works:(( I am interesred how tell to bayes THIS MAIL IS SPAM(by using
sa-learn), WHEN THIS SAME MAIL COME AGAIN SO YOU HAVE TO MARK LIKE SPAM! I
know that bayes find similar element between mail and according to decide.
But when I mark mail like spam a next mail have 100% similarity so bayes
HAVE TO mark it like SPAM. It is logical acording to me.

-- 
View this message in context: 
http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24777034.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: SA-learn (spamassassin)

2009-08-02 Thread RW
On Sun, 2 Aug 2009 04:36:34 -0700 (PDT)
monolit xmull...@gmail.com wrote:

 

 I changed the value on 1(I use this for testing and my
 self-learning its my homework). According to me - spam bayes learning
 was activated. When I use sa-learning so bayes learn that the mail is
 spam. And bayes learn the signatures...
 
 Therefore is for me strange when I send the same mail again so bayes
 dont mark this mail like spam? I dont understand this.

What you said before was that you corrected autolearn=ham to spam with
sa-learn, another similar spam then also had autolearn=ham.

Autolearning is not based on the bayes result it's based on other
Spamassassin rules . However it wont autolearn in the opposite
direction to a strong bayes result, which is why it's a good idea to
manually train first.

My guess is that you've fed it your one spam, but haven't
fed it enough ham to satisfy  bayes_min_ham_num, so there is no bayes
result and nothing to stop autolearning in the wrong direction.

It really is pretty useless to speculate about what it's doing when
you are misusing it like this. If you just want to play with it, then
feed it 10 hams and 10 spams and set the limits to 10. It wont be very
accurate, but it should behave sensibly. 


Re: Network Tests / Rule Files Directories

2009-08-02 Thread Karsten Bräckelmann
On Sat, 2009-08-01 at 18:15 -0700, Stefan Malte Schumacher wrote:
  Evidence that it's not working? Show us some SA headers. In this case, a
  spam sample that triggered DCC, cause the Report header does show the
  rule's score.

Hmm, I wasn't clear enough. :)  I meant an identified spam, where the
Report header is added. It isn't with that sample. Anyway...

 Here is an example with Razor2, but I guess the underlying problem is the
 same. 
 
 http://www.pagan.mynetcologne.de/example-email

X-Spam-Status: No, score=2.2 required=5.0 tests=AWL,HTML_IMAGE_RATIO_04,
  HTML_MESSAGE,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CHECK,
  UNPARSEABLE_RELAY autolearn=no version=3.2.5

 As you can see, the message only gets a score of 2.2. In the beginning I
 believed that I made some embarrassing mistake with the rules concerning the
 network checks, but if you say these are okay the problem most likely lies
 somewhere else. 

AWL. Obviously, it counters the custom scores, based on the sender's
history. And it seems, the sores have been really low in the past.

  spamassassin -t  sample

What does that say at the bottom of the output, for this sample?


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SA-learn (spamassassin)

2009-08-02 Thread Karsten Bräckelmann
On Sun, 2009-08-02 at 02:00 +0100, RW wrote:
 On Sun, 02 Aug 2009 01:42:21 +0200 Karsten Bräckelmann wrote:

   when I learn bayes by hand (sa-learn --spam --file mail) that this
   mail is spam? I have explicit set in local.cf bayes_min_spam_num 1.
   This means that for bayes is sufficient one mail for
   learning(according to me). But it dosesnt work.

  Do NOT do that.
  
  Unless you *really* understand the implications. Which you don't.
  It's a default for a reason.
  
  It's a counter-measure against bad learning, to force at least some
  MINIMAL manual training, before auto-learning kicks in. You just side-
  stepped that.
 
 AFAIK it doesn't affect autoleaning at all, bayes_min_spam_num 
 bayes_min_ham_num control when scoring starts.

Well, it *does* nonetheless. *shrug*

As per the docs, that threshold controls when Bayes activates. Nothing
more, nothing less. Want to see for yourself?


$ echo | spamassassin --cf='score EMPTY_MESSAGE 6' --cf='score MISSING_DATE 6'

X-Spam-Status: Yes, score=17.3 required=8.0 tests=EMPTY_MESSAGE,MISSING_DATE,
  MISSING_HEADERS,MISSING_MID,MISSING_SUBJECT,NO_HEADERS_MESSAGE,NO_RECEIVED,
  NO_RELAYS,TVD_SPACE_RATIO autolearn=spam version=3.2.5

$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  2  0  non-token data: nspam
0.000  0  1  0  non-token data: nham
0.000  0 20  0  non-token data: ntokens


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SA-learn (spamassassin)

2009-08-02 Thread RW
On Sun, 02 Aug 2009 17:15:52 +0200
Karsten Bräckelmann guent...@rudersport.de wrote:

 On Sun, 2009-08-02 at 02:00 +0100, RW wrote:
  On Sun, 02 Aug 2009 01:42:21 +0200 Karsten Bräckelmann wrote:

   It's a counter-measure against bad learning, to force at least
   some MINIMAL manual training, before auto-learning kicks in. 

  AFAIK it doesn't affect autoleaning at all, bayes_min_spam_num 
  bayes_min_ham_num control when scoring starts.
 
 Well, it *does* nonetheless. *shrug*
 
 As per the docs, that threshold controls when Bayes activates. Nothing
 more, nothing less. Want to see for yourself?
..
 X-Spam-Status: Yes, score=17.3 required=8.0
 tests=EMPTY_MESSAGE,MISSING_DATE,
 MISSING_HEADERS,MISSING_MID,MISSING_SUBJECT,NO_HEADERS_MESSAGE,NO_RECEIVED,
 NO_RELAYS,TVD_SPACE_RATIO autolearn=spam version=3.2.5
 

If you read back you'll see that that's consistent with what I wrote and
the opposite of what you wrote.

I said that the limits don't effect autolearning, just scoring
(activation).

Whatever you think you wrote, what you actually wrote was:

  to force at least some MINIMAL manual training, before
   auto-learning kicks in

There's no ambiguity there, the use of the word force implies that
manual training is a prerequisite to auto-learning.



Re: Razor, spamassassin - network test

2009-08-02 Thread Karsten Bräckelmann
Getting kind of a headache, trying to wrap my head around this confusing
mess. Anyway, here's my shot at this.

On Sun, 2009-08-02 at 03:31 -0700, an anonymous Nabble user wrote:
   When I use spamassassin -t -D razor2  /tmp/spam
   so I dont get the hash and so on but content analysis
   details...bayes clasification and so on. I expected message like 

The -D razor2 option limits debugging to Razor. No Bayes and so on
debugging.

I believe you're ONLY looking at the end. Which, due to the -t option,
indeed does show an additional Content Analysis at the end. The Razor
debugging however is at the TOP. Have a careful look at ALL the output,
not only the end.


 debug: Razor is available
 debug: Razor Agents 1.20, protocol version 2.
 debug: Read server list from /home/jgb/.razor.lst
 debug: 72636 seconds before closest server discovery
 debug: Closest server is 209.204.62.150
 debug: Connecting to 209.204.62.150...
 debug: Connection established
 debug: Signature: 48e74b8496877ba45072b201b41eebed7038186b
 debug: Server version: 1.11, protocol version 2
 debug: Server response: Negative 48e74b8496877ba45072b201b41eebed7038186b
 debug: Message 1 NOT found in the catalogue

This is a straight copy from the wiki [1], explaining how to test Razor
is working. However, it's an *old* snippet. Do run the command and have
a look at the Razor debug output at the top.

It will be different, cause this snippet is really, really old. Note the
version and protocol. But it will get you all the debugging output.


 I dont have any idea howto do razor works. This command(spamassassin -t -D
 razor2  /tmp/spam) is without --lint and its recommended by spamassassin
 www pages.so  I am begginer in this field and therefore I need accurate
 advise. 

That command is correct.


[1] http://wiki.apache.org/spamassassin/RazorHowToTell

-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SA-learn (spamassassin)

2009-08-02 Thread Karsten Bräckelmann
On Sun, 2009-08-02 at 18:31 +0100, RW wrote:
   AFAIK it doesn't affect autoleaning at all, bayes_min_spam_num 
   bayes_min_ham_num control when scoring starts.
  
  Well, it *does* nonetheless. *shrug*

 If you read back you'll see that that's consistent with what I wrote and
 the opposite of what you wrote.

Nah, I did set the thresholds to 1. :)

 I said that the limits don't effect autolearning, just scoring
 (activation).

Damn. My test-case was non-conclusive, I failed to crosscheck. :/

You are correct, auto-learning is not affected by these thresholds. SA
does bootstrap Bayes training, even if nspam/nham still is below the
limits. Sorry, my bad.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Razor, spamassassin - network test

2009-08-02 Thread monolit

I understand that I must read whole output(message(TOP message)). But the
output this command is very fast and it stop at the end. I dont catch TOP of
message. I tried | more switch but it didint help. I tried redirecting
output to the file but it doesnt work. The file was empty:( I dont know how
can I read the TOP of output message.

The last things from spamassassin web is:

Edit your spamd start-up script, or start-up options file (depending on
which OS you're running, these may be different). There should be a -L or
--local switch in that file. Remove it to enable network tests.

I cant find the file with this switch - I use CentOS distro. 
 
-- 
View this message in context: 
http://www.nabble.com/Razor%2C-spamassassin---network-test-tp24773506p24780477.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Razor, spamassassin - network test

2009-08-02 Thread Karsten Bräckelmann
On Sun, 2009-08-02 at 11:17 -0700, monolit wrote:
 I understand that I must read whole output(message(TOP message)). But the
 output this command is very fast and it stop at the end. I dont catch TOP of
 message. I tried | more switch but it didint help. I tried redirecting
 output to the file but it doesnt work. The file was empty:( I dont know how
 can I read the TOP of output message.

You mean, your terminal does not have a scroll-back buffer? You can't
simply go back a few pages?

Well, then try redirecting STDERR, instead of STDOUT only. That's where
the debugging messages are.

  spamassassin -D razor2   sample.msg  21 | less


 Edit your spamd start-up script, or start-up options file (depending on
 which OS you're running, these may be different). There should be a -L or
 --local switch in that file. Remove it to enable network tests.
 
 I cant find the file with this switch - I use CentOS distro. 

This  (a) applies to spamd only, not running the 'spamassassin' script
as you do right now, and  (b) only in the case network-tests have
explicitly been disabled in the daemon start-up script.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SA-learn (spamassassin)

2009-08-02 Thread monolit

FROM SA WWW
bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
To be accurate, the Bayes system does not activate until a certain
number of ham (non-spam) and spam have been learned. The default is 200 of
each ham and spam, but you can tune these up or down with these two
settings. 

I have theory ...I know you will think thats bad but I tried explain how I
understand SA documentation. When I set the bayes_min_spam_num 1 so it
means that Bayes learn system will be activate. And now for example: I got
mail. I use sa-learn --spam --file mail. SA save the mail(or some signature
to the database). And when I got the same mail again so Bayes looks to the
database a he says: a the same mail like in my database which is marked
like spam, and he mark the mail like spam. According to me is it logical.
What is strange when I use SA-LEARN so database dont expand the size, but
the time of modification is the same when I sa-learn started.
-- 
View this message in context: 
http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24780842.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: SA-learn (spamassassin)

2009-08-02 Thread me

On Sun, 2 Aug 2009 11:53:53 -0700 (PDT), monolit xmull...@gmail.com
wrote:

 What is strange when I use SA-LEARN so database dont expand the size, but
 the time of modification is the same when I sa-learn started.

question is ?




Re: Razor, spamassassin - network test

2009-08-02 Thread monolit

Your command works! I found in spamassassin -D razor2   sample.msg  21 |
less  message the following:
check[9444]: [ 6] a=ce=4ep4=7542-10s=4uO_brp3_KWEDuqMYXBVHI-4-FwA
But I dont know how to recognize that is a signature(hash) of the mail. In
the old version it was clearly marked for example:
debug: Signature: 48e74b8496877ba45072b201b41eebed7038186b.

My second question is: When I send mail for example from XP a) station to XP
b) station so spamassassin write to header of mail x-spam-status and so on.
According to I recognise that mail was checked by using SA rules,
bayes(autolearn), but how can I recognize that the mail was really checked
by Razor? In mail header isnt any info and in razor.log is too any
info(about checking the mail)
-- 
View this message in context: 
http://www.nabble.com/Razor%2C-spamassassin---network-test-tp24773506p24781568.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: SA-learn (spamassassin)

2009-08-02 Thread monolit

Question is logical. When SA learnt new spam/ham so SA have to write new info
to the database and I think that database have to increase size. If you have
for example *.doc file and you modify it. You add several words - *.doc will
be bigger(increase his size).
-- 
View this message in context: 
http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24781719.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: SA-learn (spamassassin)

2009-08-02 Thread Karsten Bräckelmann
On Sun, 2009-08-02 at 11:53 -0700, an anonymous Nabble user wrote:
 I have theory ...I know you will think thats bad but I tried explain how I
 understand SA documentation. When I set the bayes_min_spam_num 1 so it
 means that Bayes learn system will be activate. And now for example: I got

As I just settled with RW in this very thread, the number of spam excee-
ding the bayes_min_spam_num value does not activate Bayes *learn*ing. It
means that Bayes will classify mail -- based on what it learned before.

Learning, whether manual or automatic, always is available if use_bayes
and bayes_auto_learn are enabled.

The bayes_min_(ham|spam)_num values ONLY control, how many messages
Bayes needs to have learned, before it should start classifying mail.
And again, 1 is not a sane number.


 mail. I use sa-learn --spam --file mail. SA save the mail(or some signature
 to the database). And when I got the same mail again so Bayes looks to the
 database a he says: a the same mail like in my database which is marked
 like spam, and he mark the mail like spam. According to me is it logical.

No. *sigh*  I did explain this earlier today. This is NOT how Bayes
works. Bayes does NOT keep signatures of entire messages. Instead, it
keeps track of *tokens*, and the number they have been seen in ham or
spam. Think of tokens as words.

Please do read up on Bayes. And please stop re-iterating this false
assumption.

Given you repeating some signature of a message, and your other thread
regarding Razor (which does actually calculate some signatures for a
message) -- I have a feeling you are confusing Bayes with Razor. They
are entirely unrelated and do not use the same mechanisms.


 What is strange when I use SA-LEARN so database dont expand the size, but
 the time of modification is the same when I sa-learn started.

It is a database. It is not a flat text file. There is nothing strange
about updating values in a database, and not seeing it inflate
proportional to your input data.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SA-learn (spamassassin)

2009-08-02 Thread Benny Pedersen

On Sun, 2 Aug 2009 13:20:41 -0700 (PDT), monolit xmull...@gmail.com
wrote:
 Question is logical.

so are google :)

 When SA learnt new spam/ham so SA have to write new info
 to the database and I think that database have to increase size.

no, my bayes db is around 150M, but all my mail is in webmail at 800M so
where is the rest in bayes ? :)

 If you have for example *.doc file and you modify it. You add several
words - *.doc
 will be bigger(increase his size).

if you use bayes on mysql and dump the data, then you see that it not just
add new words, it also count how much this word is seen in spam vs ham, and
all this words is not just words as we write them here, is encoded to
signatures that dont use that much room in the db

one example is you can try md5 sum your email address, it will be same
length everytime no matter how many chars you email have

-- 
Benny Pedersen


Re: Razor, spamassassin - network test

2009-08-02 Thread Karsten Bräckelmann
I'm starting to seriously wonder, what your homework actually is about.


On Sun, 2009-08-02 at 13:05 -0700, an anonymous Nabble user wrote:
 Your command works! I found in spamassassin -D razor2   sample.msg  21 |
 less  message the following:
 check[9444]: [ 6] a=ce=4ep4=7542-10s=4uO_brp3_KWEDuqMYXBVHI-4-FwA
 But I dont know how to recognize that is a signature(hash) of the mail. In

This is a question for the Razor community, don't you think?

(Hint: The Razor community is also not hosted at some Ubuntu help forum.
Where you previously posted these two threads, and then dumped a copy of
the forum-mangled text to the SA forum at Nabble.)

 the old version it was clearly marked for example:
 debug: Signature: 48e74b8496877ba45072b201b41eebed7038186b.

This hash is hexadecimal encoded. Unlike the values above. A crypto-
graphic hash does not necessarily need to be encoded in hex.


 My second question is: When I send mail for example from XP a) station to XP
 b) station so spamassassin write to header of mail x-spam-status and so on.
 According to I recognise that mail was checked by using SA rules,
 bayes(autolearn), but how can I recognize that the mail was really checked
 by Razor? In mail header isnt any info and in razor.log is too any
 info(about checking the mail)

If Razor is enabled in SA, SA will do the test. The rule gets hit (and
added to the Status header) only, if it is recognized as spam by Razor.

You probably would be able to define more rules, with an informational
score of 0.001, using a much wider range possibly covering all cases.
See 25_razor2.cf for the current rule.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SA-learn (spamassassin)

2009-08-02 Thread monolit

To Benny Pedersen: I understand your explanation about increasing of
spamassassin database. Your example with md5 is clearly. Ok thank you very
much!

To by Karsten Bräckelmann-2: I want to apologize for my approach - I use
Ubuntu and other forums because I am hopeless because my homework was
install configure and run antispam(spamassassin, ClamAV, Clamsmtp,razor,
postfix). Now I am under pressure because tomorrow I have to deliver my
solution to my chief... I must explain to him how it works and so on. 

the number of spam exceeding the bayes_min_spam_num value does not activate
Bayes *learn*ing. It means that Bayes will classify mail -- based on what it
learned before. it keeps track of *tokens*, and the number they have been
seen in ham or spam. Your  explanation is confusing for me, because you
claim value of min_spam_num  means that Bayes will classify mail -- based on
what it learned before My min_spam_num value is 1. I get the first mail.
Subject: viagra; body: viagra. I use sa - learn -spam for this mail. I get
new mail: Subject: viagra; body: viagra. What will do Bayes according to
you? Keep in mind your words 
The bayes_min_(ham|spam)_num values ONLY control, how many messages
Bayes needs to have learned, before it should start classifying mail.  = my
Bayes can classifying mail(because min_spam_num value is 1 = the condition
is accomplish). What now? Will be my new mail mark like spam? Or will get
any higher score...?


And again, 1 is not a sane number. - I endeavour to explain to you that this
is only homework. Why number 1? Because I want to see on my own eyes how 
bayes works. I dont have time find many really spam(I know the number must
be bigger about 1000 - its OK I knew it).
-- 
View this message in context: 
http://www.nabble.com/SA-learn-%28spamassassin%29-tp24773517p24782439.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: SA-learn (spamassassin)

2009-08-02 Thread Karsten Bräckelmann
On Sun, 2009-08-02 at 14:43 -0700, an anonymous Nabble user wrote:
 To by Karsten Bräckelmann-2: I want to apologize for my approach - I use
 Ubuntu and other forums because I am hopeless because my homework was
 install configure and run antispam(spamassassin, ClamAV, Clamsmtp,razor,
 postfix). Now I am under pressure because tomorrow I have to deliver my
 solution to my chief... I must explain to him how it works and so on. 

Good luck with that.


Utterly fucked-up quoting, err, dumping of previous posts intermixed
with comments, fixicated.

  the number of spam exceeding the bayes_min_spam_num value does not activate
  Bayes *learn*ing. It means that Bayes will classify mail -- based on what it
  learned before.

  it keeps track of *tokens*, and the number they have been seen in ham
  or spam.

 Your  explanation is confusing for me, because you
 claim value of min_spam_num  means that Bayes will classify mail -- based on
 what it learned before My min_spam_num value is 1. I get the first mail.
 Subject: viagra; body: viagra. I use sa - learn -spam for this mail. I get
 new mail: Subject: viagra; body: viagra. What will do Bayes according to
 you? Keep in mind your words 

Bayes will check the tokens against its database. Based on the number of
occurrences of each token in ham and spam, Bayes will return whether the
mail appears spammy or hammy (based on what it learned before), and its
confidence of that assessment.

This classification (ham or spam) and confidence will be scored by SA.

Keep in mind there are a LOT more tokens in a message than merely the
words in the Subject and Body. This DOES have a severe impact on your
results, if your test spam is a self-generated message with the word
Viagra as Subject and Body. Nope, this is not a proper test environment.


  The bayes_min_(ham|spam)_num values ONLY control, how many messages
  Bayes needs to have learned, before it should start classifying mail.

 = my Bayes can classifying mail(because min_spam_num value is 1 = the
 condition is accomplish). What now? Will be my new mail mark like spam?
 Or will get any higher score...?

It will be classified (by Bayes) based on the tokens in the message and
the previously learned statistics. Bayes does NOT only mark spam. It
also can report a message to look like ham.

Anyway, I asked you before to provide sa-learn --dump magic output. You
didn't. Given the intro, I seriously wonder if the user you are training
Bayes and scanning mail is the same anyway.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SA-learn (spamassassin)

2009-08-02 Thread Matt Kettler
monolit wrote:
 Question is logical. When SA learnt new spam/ham so SA have to write new info
 to the database and I think that database have to increase size. If you have
 for example *.doc file and you modify it. You add several words - *.doc will
 be bigger(increase his size).
   
The database doesn't need to grow in size.

A berkley db file can contain free space. This is done to avoid
constantly shrinking and growing the file on disk. Deleted elements are
merely marked as free space for later use.

Therefore, data can be added to a berkley db file, without an increase
in file size.