On Sun, Apr 25, 2004 at 12:27:20PM -0400, Tom Allison wrote:

# Could you explain what a "registered users" means?

Registered users are active credentials used to report/revoke.  This
is what we use to track reputation and trust.

# I am under the impression that I am registered in that I am capable
# of reporting spam back to Razor2.  But I'm pretty sure that is not
# the same meaning as you have used here.

No, that is the same meaning.

# I'm also interested in knowing what the issues might be that
# contribute to a poor detection rate.

There are a multitude of issues to consider around poor detection
rates, and to cover them all in detail really would require an
extensive, whitepaper-like effort.  I'll try to summarize a few
important ones here:

 1. SpamNet is a social democracy, and works best for the average
    (majority) user.  Likewise, the more atypical the user is in
    either their representation in the social democracy (minority), or
    in their general online habits, the less effective SpamNet will
    tend to be for them.

    Consider that there exists some statistical distribution of Spam
    over the populace at any given moment.  Those that receive Spam
    common to the populace (attracted by the basic online habits of
    the average SpamNet user) see the best accuracy because there is a
    statistically greater number of trusted people seeing and blocking
    the common Spam, which raises its confidence level rapidly within
    the system.

    Further consider that most razor2-agents users are not
    representative of any average or majority.  People who have the
    patience to install, configure and tweak SpamAssassin, people who
    have the knowledge and experience to install Perl modules and
    manipulate .procmailrc's and the like, they are not representative
    of any average, typical email user.  Even though it is precisely
    these types of individuals that build and run the Internet as we
    know it, they are by no means a majority and thus a social
    democracy of this type will work less for them.  It *does* work,
    and some razor2-agents users do enjoy decent accuracy, but it
    works less relative to the majority in part because Spam is
    temporal and given fewer eyes the confidence level takes longer to
    rise.

    (Just to be clear, though, the previous statements made about
    content taking hours, maybe days to become known as Spam in the
    system is completely uninformed and just utter nonsense.  The
    common case is measured in seconds, occasionally minutes.)

 2. SpamNet and all clients that interact with it support multiple
    signature schemes.  This should be readily apparent to anyone that
    looks at a razor2-agents log or packet sniffs the traffic.

    This is one lesson we learned from Razor (v1), which only had one
    signature scheme (an SHA1 with some normalization of input).  To
    change the signature scheme would mean invalidating all the
    knowledge the system had accumulated in one fell swoop, resetting
    the state back to 0.  Obviously this would have been an untenable
    result.

    Being that we designed SpamNet with an extensible data model on
    the backend that allows new schemes to come in and old ones to be
    retired without significantly affecting the global state of
    knowledge of what is Spam, we are able to continue research and
    development on all aspects of the system, and are constantly
    adding new features and functionality, improving performance,
    scaling, etc.

    Given how the system works, various clients use a subset, superset
    or a completely different set of signature algorithms than others
    to achieve greater accuracy.  The schemes are all tied together
    within the backend, and being connected each algorithm improves
    the efficacy of the system overall, but results can very depending
    on which signature schemes are being employed.

    razor2-agents employs only the nsha1 (normalized sha1) and ehash
    (ephemeral hash) algorithms, if memory serves correctly.  Clearly,
    this is not the full set of signature schemes that exists within
    the SpamNet system.


There are several other important points to consider, but I think this
is a decent start.  Perhaps Vipul and/or others would like to share
their thoughts on this thread.

Best,

--jordan

Attachment: pgp00000.pgp
Description: PGP signature

Reply via email to