First -- my name is not Jim.  Secondly -- I don't care what Spamhaus
does, I'm asking what you suggest SpamAssassin do to measure FPs.

--j.

On Mon, Nov 16, 2009 at 06:00, rich...@buzzhost.co.uk
<rich...@buzzhost.co.uk> wrote:
> On Sun, 2009-11-15 at 20:34 +0000, Justin Mason wrote:
>> On Sun, Nov 15, 2009 at 08:53, rich...@buzzhost.co.uk
>> <rich...@buzzhost.co.uk> wrote:
>> > On Sun, 2009-11-15 at 03:14 -0500, Warren Togami wrote:
>> >> http://mail-archives.apache.org/mod_mbox/spamassassin-users/200910.mbox/%3c4ad11c44.9030...@redhat.com%3e
>> >> Compare this report to a similar report last month.
>> >>
>> >> http://wiki.apache.org/spamassassin/NightlyMassCheck
>> >> The results below are only as good as the data submitted by nightly
>> >> masscheck volunteers.  Please join us in nightly masschecks to increase
>> >>   the sample size of the corpora so we can have greater confidence in
>> >> the nightly statistics.
>> >>
>> >> http://ruleqa.spamassassin.org/20091114-r836144-n
>> >> Spam 131399 messages from 18 users
>> >> Ham  189948 messages from 18 users
>> >>
>> >> ============================
>> >> DNSBL lastexternal by Safety
>> >> ============================
>> >> SPAM%    HAM%    RANK RULE
>> >> 12.8342% 0.0021% 0.94 RCVD_IN_PSBL *
>> >> 12.3053% 0.0026% 0.94 RCVD_IN_XBL
>> >> 31.2499% 0.0827% 0.87 RCVD_IN_ANBREP_BL *2
>> >> 80.2578% 0.1485% 0.86 RCVD_IN_PBL
>> >> 27.1836% 0.1985% 0.79 RCVD_IN_SORBS_DUL
>> >> 19.8213% 0.1785% 0.79 RCVD_IN_SEMBLACK *
>> >> 90.9360% 0.3854% 0.77 RCVD_IN_BRBL_LASTEXT
>> >> 13.0564% 0.4838% 0.67 RCVD_IN_HOSTKARMA_BL *
>> >>
>> >> Commentary:
>> >> * PSBL and XBL lead in apparent safety.
>> >> * ANBREP was added after the October report and has made a surprisingly
>> >> strong showing in this past month.  ANBREP is currently unavailable to
>> >> the general public.  The list owner is thinking about going public with
>> >> the list, which I would encourage because they are clearly doing
>> >> something right.  It seems he would need a global network of automated
>> >> mirrors to be able to scale.  He would also need listing/delisting
>> >> policy clearly stated on a web page somewhere.
>> >> * SEMBLACK consistently has been performing adequately in safety while
>> >> catching a respectable amount of spam.  I personally use this
>> >> non-default blacklist.
>> >> * It is clear that the two main blacklists are Spamhaus and BRBL.  The
>> >> Zen combinatoin of Spamhaus zones is extremely effective and generally
>> >> safe.  BRBL has a high hit rate as well, with a moderate safety rating.
>> >> * HOSTKARMA_BL ranks dead last in safety for the past several weeks in a
>> >> row, while not being more effective against spam than PSBL, XBL or 
>> >> SEMBLACK.
>> >>
>> >> ===============================
>> >> HOSTKARMA_BL much better as URIBL
>> >> ===============================
>> >> SPAM%    HAM%    RANK RULE
>> >> 68.3651% 0.2806% 0.79 URIBL_HOSTKARMA_BL *
>> >>
>> >> Commentary:
>> >> While HOSTKARMA_BL is pretty unsafe as a plain DNSBL, it is surprisingly
>> >> effective as a URIBL.  This is curious as it seems it was not designed
>> >> to be used as a URIBL.  In any case as long our masschecks show good
>> >> statistics like this, I will personally use this on my own spamassassin
>> >> server.
>> >>
>> >> =========================
>> >> SPAMCOP Dangerous?
>> >> =========================
>> >> SPAM%    HAM%    RANK RULE
>> >> 17.4225% 2.6076% 0.56 RCVD_IN_BL_SPAMCOP_NET *
>> >>
>> >> Commentary:
>> >> Is Spamcop seriously this bad?  It consistently has shown a high false
>> >> positive rates in these past weeks.  Was it safer than this in the past
>> >> to warrant the current high score in spamassassin-3.2.5?
>> >>
>> >> Warren Togami
>> >> wtog...@redhat.com
>> >
>> > Is it not a bit flawed to do the metrics on volunteer submissions, given
>> > the Spamhaus has is said to have a small army of them? It means the data
>> > cannot be relied upon as any kind of sensible comparison.
>>
>> please explain.  How would you suggest measuring false positives?
>>
> Do you think that volunteer submissions are an accurate way to do them,
> or do you think that is open to abuse?
>
> For example, say I am Steve Linford with a small army of volunteers. I
> get a few false positives come in from Spamhaus, and a few from SORBS.
> What is my inclination when I submit the data?
>
> It takes only a small amount of research and a trawl through the NANAE
> archives to get a handle on the problem, and the general abuse and
> nefarious goings on with DNSBL volunteers. It is fair to say that there
> is not much love lost.
>
> I'm not pretending I have the answers, so it's probably better to take
> these lists with a large bucket of salt and find how any given DNSBL
> list works for a given organisation.
>
> In a world where presidents and world leaders in America, Zimbabwe and
> Afghanistan get 'elected' on tainted data, some random RBL 'comparison'
> list is a trivial by comparison. It must, however, be duly remembered
> that there are many competing 'sides' in the world of the DNSBL's, each
> looking to do the other discredit.
>
> Perhaps Jim, as you posed the question - you have some strong feelings
> on the matter that you would like to share?
>
>



-- 
--j.

Reply via email to