On Sun, Apr 25, 2004 at 12:27:20PM -0400, Tom Allison wrote: # Could you explain what a "registered users" means?
Registered users are active credentials used to report/revoke. This
is what we use to track reputation and trust.
# I am under the impression that I am registered in that I am capable
# of reporting spam back to Razor2. But I'm pretty sure that is not
# the same meaning as you have used here.
No, that is the same meaning.
# I'm also interested in knowing what the issues might be that
# contribute to a poor detection rate.
There are a multitude of issues to consider around poor detection
rates, and to cover them all in detail really would require an
extensive, whitepaper-like effort. I'll try to summarize a few
important ones here:
1. SpamNet is a social democracy, and works best for the average
(majority) user. Likewise, the more atypical the user is in
either their representation in the social democracy (minority), or
in their general online habits, the less effective SpamNet will
tend to be for them.
Consider that there exists some statistical distribution of Spam
over the populace at any given moment. Those that receive Spam
common to the populace (attracted by the basic online habits of
the average SpamNet user) see the best accuracy because there is a
statistically greater number of trusted people seeing and blocking
the common Spam, which raises its confidence level rapidly within
the system.
Further consider that most razor2-agents users are not
representative of any average or majority. People who have the
patience to install, configure and tweak SpamAssassin, people who
have the knowledge and experience to install Perl modules and
manipulate .procmailrc's and the like, they are not representative
of any average, typical email user. Even though it is precisely
these types of individuals that build and run the Internet as we
know it, they are by no means a majority and thus a social
democracy of this type will work less for them. It *does* work,
and some razor2-agents users do enjoy decent accuracy, but it
works less relative to the majority in part because Spam is
temporal and given fewer eyes the confidence level takes longer to
rise.
(Just to be clear, though, the previous statements made about
content taking hours, maybe days to become known as Spam in the
system is completely uninformed and just utter nonsense. The
common case is measured in seconds, occasionally minutes.)
2. SpamNet and all clients that interact with it support multiple
signature schemes. This should be readily apparent to anyone that
looks at a razor2-agents log or packet sniffs the traffic.
This is one lesson we learned from Razor (v1), which only had one
signature scheme (an SHA1 with some normalization of input). To
change the signature scheme would mean invalidating all the
knowledge the system had accumulated in one fell swoop, resetting
the state back to 0. Obviously this would have been an untenable
result.
Being that we designed SpamNet with an extensible data model on
the backend that allows new schemes to come in and old ones to be
retired without significantly affecting the global state of
knowledge of what is Spam, we are able to continue research and
development on all aspects of the system, and are constantly
adding new features and functionality, improving performance,
scaling, etc.
Given how the system works, various clients use a subset, superset
or a completely different set of signature algorithms than others
to achieve greater accuracy. The schemes are all tied together
within the backend, and being connected each algorithm improves
the efficacy of the system overall, but results can very depending
on which signature schemes are being employed.
razor2-agents employs only the nsha1 (normalized sha1) and ehash
(ephemeral hash) algorithms, if memory serves correctly. Clearly,
this is not the full set of signature schemes that exists within
the SpamNet system.
There are several other important points to consider, but I think this
is a decent start. Perhaps Vipul and/or others would like to share
their thoughts on this thread.
Best,
--jordan
pgp00000.pgp
Description: PGP signature
