At 05:06 AM 1/13/2006, Markus Braun wrote:
Hello together,

At the moment i use spamassassin for spam protection.
Also Autobayes is activated.

I know when autobayes make some mistakes with ham or spam, that i can correct it manually.

e.g.
sa -- learn (the file)

But what does sa--learn keep in mind?


SpamAssassin's bayes system breaks the message body and many of the headers up into "tokens". For the most part, tokens are simply words, but it also breaks email addresses up (username and domain parts are separate tokens), and does a lot of weird things I don't fully understand to grab bits of headers.

sa-learn then takes all these and dumps them into a database, and tracks how many times each token was seen in spam, and how many times in nonspam. From these counts, it also calculates a probability that the token will be in a spam message (0.000 to 1.000, aka 0% to 100%).

Later, when a message is scanned, SpamAssassin breaks it up into tokens and checks the database. From the database it pulls out all the probability for all the tokens that match and combines them to calculate a total probability for the message. (it uses a chi-squared combine if you're a statistics geek)

If you really want to see bayes running in gross detail, try adding -D to a sa-learn or spamassassin run. You can see the bayes tokens from the messages being processed on the debug output.



On the next day, a spam emails als come from another email adress. But the adressname and the header information are the same.

So my question is, how can i make it that these emails are also marked als spam.

sa-learn will help here. Another way to help is by grabbing some of the rulesets off of rulesemporium.com that cover the spam which is giving you the most trouble.

Reply via email to