Re: Spamassassin & Bayes

Matt Kettler Fri, 13 Jan 2006 06:42:27 -0800

At 05:06 AM 1/13/2006, Markus Braun wrote:

Hello together,
At the moment i use spamassassin for spam protection.
Also Autobayes is activated.
I know when autobayes make some mistakes with ham or spam, that i cancorrect it manually.
e.g.
sa -- learn (the file)

But what does sa--learn keep in mind?

SpamAssassin's bayes system breaks the message body and many of the headersup into "tokens". For the most part, tokens are simply words, but it alsobreaks email addresses up (username and domain parts are separate tokens),and does a lot of weird things I don't fully understand to grab bits ofheaders.

sa-learn then takes all these and dumps them into a database, and trackshow many times each token was seen in spam, and how many times in nonspam.From these counts, it also calculates a probability that the token will bein a spam message (0.000 to 1.000, aka 0% to 100%).

Later, when a message is scanned, SpamAssassin breaks it up into tokens andchecks the database. From the database it pulls out all the probability forall the tokens that match and combines them to calculate a totalprobability for the message. (it uses a chi-squared combine if you're astatistics geek)

If you really want to see bayes running in gross detail, try adding -D to asa-learn or spamassassin run. You can see the bayes tokens from themessages being processed on the debug output.

On the next day, a spam emails als come from another email adress. But theadressname and the header information are the same.
So my question is, how can i make it that these emails are also marked alsspam.

sa-learn will help here. Another way to help is by grabbing some of therulesets off of rulesemporium.com that cover the spam which is giving youthe most trouble.

Re: Spamassassin & Bayes

Reply via email to