On Sat, 2004-05-08 at 10:08, Angus Lees wrote:
> From what I can see any spam filter that needs training is missing the
> point

If you use a model where you train your filter with every single filter,
then yes, your viewpoint that "needs training is missing the point"
makes sense.

But most people I work with only train their spam filters when something
is mis-classified. 

This has the effect of keeping the database size down a touch, and also
means that the mechanics of "remove this message from the database of
spam words and add it to the database of white words" is unnecessary.

>  - but I've never actually run any of the Bayesian filters so its
> purely ignorant prejudice ;)

My scheme is this: I run bogofilter on the server where my mail is
aggregated as the final step before it gets sorted into folders.

Positives are is sent to a "ProbableSpam" folder.

I have two other relevant folders: Spam and NotSpam. I have a cron job
that periodically (say, every 6 hours) takes any messages from those
folders and feeds them into filter with the appropriate marking, then
empties the [Not]Spam folder.

If something is misclassified, [ie a spam message shows up in my InBox]
all I do is move (or copy) it to the Spam or NotSpam folder, and forget
about it. [It's IMAP, so it goes to the server] A few hours later, the
script will come along and train accordingly.

This may or may not meet your definition of "acceptable usage model" but
<shrug> works for us.

AfC

-- 
Andrew Frederick Cowie
Operational Dynamics Consulting Pty Ltd

Australia: +61 2 9977 6866  North America: +1 646 472 5054

http://www.operationaldynamics.com/
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to