Hello tony, Tuesday, March 16, 2004, 11:08:43 PM, you wrote:
>> TB> Hey cool, done that now. Just looked at the headers of a message >> TB> received which says "autolearn=ham" This was a message from the SA >> TB> group funnily enough - presumably that is correct? >> >> Unless that message included spam samples, then no problem. >> >> I suggest you set your non-spam auto-learn threshold to -0.01 to make >> sure that spam that hits no rules is not accidentally learned as ham. tac> errr...what? who? where? how??? I use: > auto_learn_threshold_nonspam -2 > bayes_auto_learn_threshold_nonspam -2 I forget which version applies to 2.5x and which to 2.6x -- adapt to the score you want to use as a threshold, and put it into your local config file (eg: local.cf or whatever). tac> Also, loooking at more headers - if they say "autolearn=no", does that tac> just mean SA had no idea if it was spam or ham, or does it just mean that tac> autolearn is off and I Was looking at an old message? ;)else? No, autolearn=no simply means the email didn't score high enough (as spam) or low enough (as non-spam) to be auto-learned. It means auto-learn is on, but the email message didn't qualify. >> My understanding is that each domain with a $HOME will have one >> $HOME/.spamassassin directory, and the bayes database built there will >> apply to all [EMAIL PROTECTED] for that domain. tac> Cool. That does indeed seem to be the case - my mailboxes were tac> refreshingly free of spam this morning - hurrah! >> cp /dev/null $file >> or >> cat </dev/null >$file >> are two methods I've used to empty files. tac> okay will do that - is there any advantage of one over the other apart tac> from less typing? ;) None that we can measure. >> TB> - is my first ever shell script!: tac> [..] >> TB> Any obvious flaws there guys, or something I could do better? It >> >> Looks good to me. I wouldn't cat them all into one file first, since my >> understanding is that the shorter/quicker sa-learn runs are better (less >> chance they'll block bayes update by incoming email and auto-learn). tac> okay, thanks m8. You cat them in your script though don't you? No. My commands are: > sa-learn --spam --mbox sa.learn.spam # do the sa-learn > ls -lF `pwd`/sa.learn.spam # record this file in my log > cat sa.learn.spam >>~/mail/cw-spam/inbox # append to my corpus > cat ~/mynull > sa.learn.spam # empty the mailbox >> TB> If the former, then presumably my script would be better off >> contatenating >> TB> the spam and ham files before passing them to a single run of >> sa-learn? >> >> I run my scripts once an hour. tac> Blimey - do you get THAT much spam? ;) 7-8k spam a week, and will probably hit 9k around June. >> You need 200+ spams and 200+ hams before Bayes takes effect and starts >> applying its scores to your emails. It then remains effective unless you >> drop below those numbers (such as by deleting the database files and >> starting over). That has nothing to do with sa-learn. The more often >> sa-learn runs, the more current your bayes database is. tac> Okay, thanks. tac> For ham, do you just copy everything from your inbox (apart from spam not tac> caught) or is there stuff you WOULDN'T put through the spam filter? eg, tac> all the posts to this list? I do not sa-learn the SA mailing lists, nor any other mail which contains samples of spam, nor discussions of spam. Otherwise I sa-learn everything. tac> I am on a number of lists and the volume would make perfect ham material, tac> but I'd be worried sometimes that the content wouldn't and certain tac> characteristics - eg being sent to a large no. of people, me not being tac> explicitly set as a recipient. If you (or your people) ever get non-spam from the same people who use those lists, discussing the same topics, then learning them will be a big help (avoid FPs). tac> For spam, is there any value in passing already identified spam (sent to tac> the spambox thru sa-learn? a) If you do multiple domains as I do, then it's easier to simply feed all spam into all three domains rather than figure out where it has already been learned. b) For simplicity of management, I dump spam from all three domains into one spamtrap. It's easier for me then to sa-learn all of them, rather than keep the already learned spam separate. Bob Menschel
