On Sun, Apr 25, 2004 at 12:27:20PM -0400, Tom Allison wrote: # Could you explain what a "registered users" means?
Registered users are active credentials used to report/revoke. This is what we use to track reputation and trust. # I am under the impression that I am registered in that I am capable # of reporting spam back to Razor2. But I'm pretty sure that is not # the same meaning as you have used here. No, that is the same meaning. # I'm also interested in knowing what the issues might be that # contribute to a poor detection rate. There are a multitude of issues to consider around poor detection rates, and to cover them all in detail really would require an extensive, whitepaper-like effort. I'll try to summarize a few important ones here: 1. SpamNet is a social democracy, and works best for the average (majority) user. Likewise, the more atypical the user is in either their representation in the social democracy (minority), or in their general online habits, the less effective SpamNet will tend to be for them. Consider that there exists some statistical distribution of Spam over the populace at any given moment. Those that receive Spam common to the populace (attracted by the basic online habits of the average SpamNet user) see the best accuracy because there is a statistically greater number of trusted people seeing and blocking the common Spam, which raises its confidence level rapidly within the system. Further consider that most razor2-agents users are not representative of any average or majority. People who have the patience to install, configure and tweak SpamAssassin, people who have the knowledge and experience to install Perl modules and manipulate .procmailrc's and the like, they are not representative of any average, typical email user. Even though it is precisely these types of individuals that build and run the Internet as we know it, they are by no means a majority and thus a social democracy of this type will work less for them. It *does* work, and some razor2-agents users do enjoy decent accuracy, but it works less relative to the majority in part because Spam is temporal and given fewer eyes the confidence level takes longer to rise. (Just to be clear, though, the previous statements made about content taking hours, maybe days to become known as Spam in the system is completely uninformed and just utter nonsense. The common case is measured in seconds, occasionally minutes.) 2. SpamNet and all clients that interact with it support multiple signature schemes. This should be readily apparent to anyone that looks at a razor2-agents log or packet sniffs the traffic. This is one lesson we learned from Razor (v1), which only had one signature scheme (an SHA1 with some normalization of input). To change the signature scheme would mean invalidating all the knowledge the system had accumulated in one fell swoop, resetting the state back to 0. Obviously this would have been an untenable result. Being that we designed SpamNet with an extensible data model on the backend that allows new schemes to come in and old ones to be retired without significantly affecting the global state of knowledge of what is Spam, we are able to continue research and development on all aspects of the system, and are constantly adding new features and functionality, improving performance, scaling, etc. Given how the system works, various clients use a subset, superset or a completely different set of signature algorithms than others to achieve greater accuracy. The schemes are all tied together within the backend, and being connected each algorithm improves the efficacy of the system overall, but results can very depending on which signature schemes are being employed. razor2-agents employs only the nsha1 (normalized sha1) and ehash (ephemeral hash) algorithms, if memory serves correctly. Clearly, this is not the full set of signature schemes that exists within the SpamNet system. There are several other important points to consider, but I think this is a decent start. Perhaps Vipul and/or others would like to share their thoughts on this thread. Best, --jordan
pgp00000.pgp
Description: PGP signature