Thursday, June 26, 2003, 11:39:49 AM, Mark wrote:
M> There has been some talk about Popfile on this forum, so I
M> thought I would comment. I have given up on Popfile after an
M> unnaceptable number of crucial false positives, very slow
M> system responses, and a lot of work to train.
It's worked well for me. Far better than the hassle of sorting
through the enormous number of spams I get. My e-mail has been
around on the Internet for nearly 10 years. In that time, it's
been added to every spam mailing list in existence! :( As
spammers change tactics, I end up doing a bit of retraining every
month or so when more spams start leaking through than I find
acceptable. But, the "training" is pretty trivial and easy to
do.
I do, however, glance through e-mails categorized as spam before
deleting them. Usually I sort by the "From" address to quickly
see if the e-mails are familiar. Spammers usually have fairly
obvious From addresses.
M> Some of the statistics we hear (99.5% sorting efficiency) is
M> distorted by the a priori probability (ie, 95% efficiency
M> would be achieved by simply sorting mailing lists, and from
M> known addresses, so 99.9% really represents 98% or so -
M> meaning that a lot of the not-easily-sorted mail is lost.
I classify into 8 buckets:
Emails Classified
Bucket Classification Count
bat 2,099 (27.19%)
etrade 47 (0.6%)
list 42 (0.54%)
muscle 89 (1.15%)
mvst 881 (11.41%)
personal 674 (8.73%)
spam 3,719 (48.18%)
tennis 167 (2.16%)
Overall accuracy:
Emails classified: 7,719
Classification errors: 154
Accuracy: 98%
Most of the errors happen between my swim team (mvst) category
and tennis category, which is kind of tough since some of the
same people are in both categories and the words between the two
can be similar.
This is probably officially off-topic at this point! :)
--
Dave Kennedy
________________________________________________
Current version is 1.62r | "Using TBUDL" information:
http://www.silverstones.com/thebat/TBUDLInfo.html