At 03:05 PM 10/18/2004, Payal Rathod wrote:
Hi,
I have SA and bayes for a few accounts and have trained bayes to around
1200 hams and 1000 spams. Is this enough? I have reduced spam a lot,
but still a few spam mails do pass through. What do I do about them?
I do training manually, checking each mailbox, so it is a hectic process.
When do I stop training and when do I start again (thumb rule!)?

Personally, I manually seed with a large volume of ham and spam at the start. From there I rely mostly on some special mail accounts I've set up and the autolearn process, and occasionaly hand feed a few "problem" emails.


About my "special" accounts:

One of these accounts is a "nonspam" account. I've manually subscribed it to select trusted industry newsletters, general news alert systems from reputable sites, and have kept it's address otherwise quite secret, and occasionally screen it for spam. Some less-trusted sites are registered with their own aliases which are funneled into this account, but can be quickly cut off if they start sourcing spam.

The other is a "spam" account. This actually is a destination account for several email addresses I've seeded around as well as several system accounts that have never been valid here. I monitor this input for poison and misdirected/misguided emails from the intellectualy challenged, yet i rarely see any.

The mailboxes for these accounts are picked up by a daily cronjob and fed to sa-learn, then archived in a rotating fashion in a directory where I can view the last week or so at my whim.

For creating the "spam" account check your mail logs and /var/spool/mail for obviously invalid guesses based on default redhat linux installs which alias a ton of accounts to root. Also check your /var/spool/mail to see if any non-users have mail accounts that are piling up the spam.

My spam seeding technique focuses on using obviously invalid examples like [EMAIL PROTECTED] in some of my explanations of how to set up various auto-notifiers for misc net-admin tools. I tend to let these incubate for a while, then enable them as spam aliases once the spam starts rolling in as undeliverables. This works quite well. It works VERY well on mailing lists that have usenet mirrors.




Reply via email to