When I am trying to train bayes ... eveyrone says you have to remove the message header first ??
I assume this means the spam tag that spam assassin adds ? If the spam is in the mailbox on the server, how do you remove the tag ???



Thanks, Peter

At 09:56 AM 2/3/2005, Peter Marshall wrote:
>OK.  I am sure you all are going to think this question is related to not
>reading .... on the contrary .. I have read countless threads, and I even
>bought the spamassassin book by orielly .. still confused :(

<snip>

>How do I know that it is actually doing the auto whitelist ?
>Will this do the auto whitelisting to ~home/.spamassassin/auto-whitelist ?

Yes, it will do it to ~/.spamassassin/auto-whitelist.

>Every email I get has autolearn=no.  Is this related to auto-whitelist or
>bayes ?

autolearn has to do with bayes, not the AWL.

>I read that use_bayes option is set to 1 by default (which means enabled).
>I assume this means bayes is always working.  All that I have to do now is
>train it, .. right ?

Maybe.. if you don't have DB_File, bayes will not work, so you might need
to install that perl module. Otherwise, you are right.

> I also read that the directives bayes_auto_learn is
>set by default to enabled, and this option will automatically classify spam
>with high scores, or ham with low ones ...
>
>I assume that since these are on by default that I do not need to add them
>to the local.cf ... right ?


Correct.


>I have read in countless places that the way to train spam / ham is >"sa-learn --no-rebuild --spam mail/spammailbox" >"sa-learn --no-rebuild --ham mail/notspambox" >"sa-learn --rebuild" > >Last of my questions ... (well .. for now anyway ... :) > >Is mail/notspambox just a normail users mailbox on the mail server ?

With those parameters it's either a single file with a single email, or a
directory full of single emails.

If you want to use a mailbox file, add --mbox to the command line.

>How do I get the spam back to the spammailbox without forwarding it from
>internal ... which would then get the "trusted networks"

One approach is forwarding as an attachment, strip the attachment, and
train that.

Others include using imap and having a script dig through the imap server
and feed stuff to sa-learn. Search the archives, this is a common question.

>I would assume it is better for bayes to be on a per-user basis ...  but I
>do not allow users to logonto the mailserver, so how do I do training on a
>per-user basis ?

Per user is better, but site-wide is still "good" if per user becomes
impractical. This is particularly true in a corporate environment, where
most users are going to have similar email anyway (because a lot of it is
going to be related to your market).



>I know there were allot of questions, ... I have been working on this for a
>while now, and these are my points of confusion. Any help would be greatly
>appreciated.
>
>Thanks,
>Peter





Reply via email to