1. SpamNet is a social democracy, and works best for the average (majority) user. Likewise, the more atypical the user is in either their representation in the social democracy (minority), or in their general online habits, the less effective SpamNet will tend to be for them.
[snip]
Further consider that most razor2-agents users are not representative of any average or majority. People who have the patience to install, configure and tweak SpamAssassin, people who have the knowledge and experience to install Perl modules and manipulate .procmailrc's and the like, they are not representative of any average, typical email user.
I've never really bought into the "individual idiosyncrasy" argument, whether with regard to tools like Razor or things like Bayesian classifiers. There's a popular myth in the Bayes community that suggests that every individual needs to maintain his own unique Bayes database, in order to account for his unique tastes and the idiosyncrasies of his mail. Those that buy into that myth end up frowning upon site-wide Bayes databases, considering them to be too great a compromise (when in practice a site-wide Bayes database is no less effective, and the norm at larger sites).
What fools most casual observers is the fact that the mathematical models that underlie statistical classifiers have to be objective, and content-neutral. These models make no assumptions about content, and so they have to rely on the fact that if recipients A and B both receive mail X, they each have an *independent* probability of deciding that X is spam. In that scenario, it is assumed by the model that widely-divergent individual taste is the norm--the "independence" of P(A) and P(B) is reflected in a uniform random function. That is, given the same mail X, P(A) might be 90%, while P(B) might be 2%.
In practice, though, these models are flawed in that there's much less "independence" involved. If the subject line for mail X happens to be an obfuscated reference to our favourite erectile dysfunction medication, padded for good measure with a bunch of spaces, and followed by a hash-buster or tracking sequence, it's hard to imagine that P(A) and P(B) would be that divergent. Even a physician or pharmacist won't have any trouble identifying such an e-mail as spam, given the obfuscation and other "obvious" spam hallmarks. The argument that someone, somewhere might *not* classify that mail as spam is hard to make.
The fact that spam, by definition, is unsolicited and sent in bulk ensures that there's no discrimination on the spammer's part with regard to who receives what content. *Everybody* receives the same types of spam, eventually, once their e-mail addresses make it on to one of the infamous lists spammers sell to one another. Whether the recipient is an "average user" or a "power user" doesn't affect the nature of the spam he receives, and provided that you give both of them the tools to report that spam, you shouldn't see any differences attributable to "social democracy". Some might report faster than others, some might take the time to report more items than others, but *what* they report should not be significantly different.
It's also worth noting that many of the "power users" out there are integrating Razor in the context of some broader spam-filtering framework at the mail server level, such that while the reports themselves may be sent by the site operator, the actual spam diagnoses are being made by the "average users" that are serviced by that site. With SpamAssassin integrating Razor checks, and SpamAssassin in widespread use at ISPs around the world, I'd argue that many (most?) of the reports being filed via razor2-agents are actually those of "average users", just as with SpamNet. Certainly that's the way ISP tools like Maia Mailguard work--the end-users confirm the status of the mail as spam or ham using a web interface, and the reporting to Razor/Pyzor/DCC gets handled behind the scenes.
Robert LeBlanc <[EMAIL PROTECTED]>
Renaissoft, Inc.
Maia Mailguard <http://www.renaissoft.com/maia/>
------------------------------------------------------- This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek For a limited time only, get FREE Ground shipping on all orders of $35 or more. Hurry up and shop folks, this offer expires April 30th! http://www.thinkgeek.com/freeshipping/?cpg=12297 _______________________________________________ Razor-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/razor-users