OK. I am sure you all are going to think this question is related to not reading .... on the contrary .. I have read countless threads, and I even bought the spamassassin book by orielly .. still confused :(
<snip>
How do I know that it is actually doing the auto whitelist ? Will this do the auto whitelisting to ~home/.spamassassin/auto-whitelist ?
Yes, it will do it to ~/.spamassassin/auto-whitelist.
Every email I get has autolearn=no. Is this related to auto-whitelist or bayes ?
autolearn has to do with bayes, not the AWL.
I read that use_bayes option is set to 1 by default (which means enabled). I assume this means bayes is always working. All that I have to do now is train it, .. right ?
Maybe.. if you don't have DB_File, bayes will not work, so you might need to install that perl module. Otherwise, you are right.
I also read that the directives bayes_auto_learn is set by default to enabled, and this option will automatically classify spam with high scores, or ham with low ones ...
I assume that since these are on by default that I do not need to add them to the local.cf ... right ?
Correct.
I have read in countless places that the way to train spam / ham is "sa-learn --no-rebuild --spam mail/spammailbox" "sa-learn --no-rebuild --ham mail/notspambox" "sa-learn --rebuild"
Last of my questions ... (well .. for now anyway ... :)
Is mail/notspambox just a normail users mailbox on the mail server ?
With those parameters it's either a single file with a single email, or a directory full of single emails.
If you want to use a mailbox file, add --mbox to the command line.
How do I get the spam back to the spammailbox without forwarding it from internal ... which would then get the "trusted networks"
One approach is forwarding as an attachment, strip the attachment, and train that.
Others include using imap and having a script dig through the imap server and feed stuff to sa-learn. Search the archives, this is a common question.
I would assume it is better for bayes to be on a per-user basis ... but I do not allow users to logonto the mailserver, so how do I do training on a per-user basis ?
Per user is better, but site-wide is still "good" if per user becomes impractical. This is particularly true in a corporate environment, where most users are going to have similar email anyway (because a lot of it is going to be related to your market).
I know there were allot of questions, ... I have been working on this for a while now, and these are my points of confusion. Any help would be greatly appreciated.
Thanks, Peter