thats quite comprehensive answering there matt - most appreciated... :D

one more though. sa-learn ham. Is this to explicity demark what should not be learnt as spam? so should you feed it the rest of your mailbox?

Ive just created the two folders and Im opening them up for others ( a small trusted fraternity ie the email group) to upload their spam to it.

So is it simply a case of whatever isnt spam put it in ham?

thanks
ronan

Matt Kettler wrote:
At 02:29 PM 11/9/2004 +0000, Ronan wrote:

1) Am I right in thinking that i can run sa-learn spam on a folder which contains spam, of which most has spassassin headers indicating the same and that sa-learn knows to disregard the (spam-assasin) headers or all headers for that matter...


SA's bayes subsystem tracks what message ID's it's learned from already and what they were learned as. It will not re-learn the same message unless you tell SA to change what it was learned as.

SA can (and does) learn useful information from mail already tagged as spam, so feeding tagged mail to sa-learn is good, not redundant. It will only ignore those it already learned or autolearned.

sa-learn will automatically ignore headers generated by SA itself. You can specify a bayes_ignore_header in your local.cf to make it ignore headers added by other tools.



2) how will the baysian checking affect the load as I have tweaked it so that currently my servers are hitting 0-5% idle during peak and anything more will probably make them fall over


bayes adds quite a bit of load, but if you're using some insanely large rulesets (ie: anything over 256kb) it's insignificant by comparison.


3) how will the baysian affect the need for some of the rulesets i have, no strike that
3b) how does the baysian affect any rulesets from say exit0/rulesemporium can any be done awaywith are any made practicaly obsolete by a well trained baysian???


Theoreticaly any and all rules can be obsoleted by a well trained bayes DB. The other rules exist to balance out the amount of work needed to get good results. You can get great results from a bayes-only system, but you've got to train it heavily and constantly.

SA's rules pick up the slack if you're not training 200 spams and 200 hams a day every day.



4) Anything else i should be looking into???


Hardware upgrades so you can run some more CPU intensive stuff? :)



-- Regards

Ronan McGlue
==============
Analyst/Programmer
Information Services
Queens University Belfast
BT7 1NN

Reply via email to