I'm planning on participating in TREC 2005 (the spam track) using SpamBayes:
<http://trec.nist.gov/> <http://plg.uwaterloo.ca/~gvcormac/spam/> Basically the idea is that a whole lot of filters are run over a few corpora (a couple of public and a couple of private) and the results are compared. (Not to say, "hey, my filter is best", but to see what works well, where improvements can be made, and all that). The testing system is similar to our (Alex's) incremental testing setup - the steps are: initialize classify emailfile resultfile train [ham|spam] emailfile resultfile finalize So there is (or can be) training after each classification. I'll create scripts (a modified sb_filter, probably) that do each of the steps. I don't think that train-on-everything is a good idea here, so will include some sort of training regime (like the incremental testing setup), too (maybe train-to-exhaustion?). I'm interested in doing this: o As research that I can work on after I submit my PhD and before I defend it. o To see how spambayes compares with various types of filter/corpus. o As a sideline to other research I'd like to do with spambayes (see #1). To get to the point of the email: o Does anyone object to me using spambayes in this way? Everyone will be acknowledged in the write-ups and all that, obviously, and I'm participating as an individual (with tentative ties to my work, and obviously using the work, but not speaking for, the spambayes group). o Is anyone else interested in this? I can certainly report back as things progress, but if anyone is really interested and can spare the time, I'd happy work on it with someone else. =Tony.Meyer _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev