First -- my name is not Jim. Secondly -- I don't care what Spamhaus does, I'm asking what you suggest SpamAssassin do to measure FPs.
--j. On Mon, Nov 16, 2009 at 06:00, rich...@buzzhost.co.uk <rich...@buzzhost.co.uk> wrote: > On Sun, 2009-11-15 at 20:34 +0000, Justin Mason wrote: >> On Sun, Nov 15, 2009 at 08:53, rich...@buzzhost.co.uk >> <rich...@buzzhost.co.uk> wrote: >> > On Sun, 2009-11-15 at 03:14 -0500, Warren Togami wrote: >> >> http://mail-archives.apache.org/mod_mbox/spamassassin-users/200910.mbox/%3c4ad11c44.9030...@redhat.com%3e >> >> Compare this report to a similar report last month. >> >> >> >> http://wiki.apache.org/spamassassin/NightlyMassCheck >> >> The results below are only as good as the data submitted by nightly >> >> masscheck volunteers. Please join us in nightly masschecks to increase >> >> the sample size of the corpora so we can have greater confidence in >> >> the nightly statistics. >> >> >> >> http://ruleqa.spamassassin.org/20091114-r836144-n >> >> Spam 131399 messages from 18 users >> >> Ham 189948 messages from 18 users >> >> >> >> ============================ >> >> DNSBL lastexternal by Safety >> >> ============================ >> >> SPAM% HAM% RANK RULE >> >> 12.8342% 0.0021% 0.94 RCVD_IN_PSBL * >> >> 12.3053% 0.0026% 0.94 RCVD_IN_XBL >> >> 31.2499% 0.0827% 0.87 RCVD_IN_ANBREP_BL *2 >> >> 80.2578% 0.1485% 0.86 RCVD_IN_PBL >> >> 27.1836% 0.1985% 0.79 RCVD_IN_SORBS_DUL >> >> 19.8213% 0.1785% 0.79 RCVD_IN_SEMBLACK * >> >> 90.9360% 0.3854% 0.77 RCVD_IN_BRBL_LASTEXT >> >> 13.0564% 0.4838% 0.67 RCVD_IN_HOSTKARMA_BL * >> >> >> >> Commentary: >> >> * PSBL and XBL lead in apparent safety. >> >> * ANBREP was added after the October report and has made a surprisingly >> >> strong showing in this past month. ANBREP is currently unavailable to >> >> the general public. The list owner is thinking about going public with >> >> the list, which I would encourage because they are clearly doing >> >> something right. It seems he would need a global network of automated >> >> mirrors to be able to scale. He would also need listing/delisting >> >> policy clearly stated on a web page somewhere. >> >> * SEMBLACK consistently has been performing adequately in safety while >> >> catching a respectable amount of spam. I personally use this >> >> non-default blacklist. >> >> * It is clear that the two main blacklists are Spamhaus and BRBL. The >> >> Zen combinatoin of Spamhaus zones is extremely effective and generally >> >> safe. BRBL has a high hit rate as well, with a moderate safety rating. >> >> * HOSTKARMA_BL ranks dead last in safety for the past several weeks in a >> >> row, while not being more effective against spam than PSBL, XBL or >> >> SEMBLACK. >> >> >> >> =============================== >> >> HOSTKARMA_BL much better as URIBL >> >> =============================== >> >> SPAM% HAM% RANK RULE >> >> 68.3651% 0.2806% 0.79 URIBL_HOSTKARMA_BL * >> >> >> >> Commentary: >> >> While HOSTKARMA_BL is pretty unsafe as a plain DNSBL, it is surprisingly >> >> effective as a URIBL. This is curious as it seems it was not designed >> >> to be used as a URIBL. In any case as long our masschecks show good >> >> statistics like this, I will personally use this on my own spamassassin >> >> server. >> >> >> >> ========================= >> >> SPAMCOP Dangerous? >> >> ========================= >> >> SPAM% HAM% RANK RULE >> >> 17.4225% 2.6076% 0.56 RCVD_IN_BL_SPAMCOP_NET * >> >> >> >> Commentary: >> >> Is Spamcop seriously this bad? It consistently has shown a high false >> >> positive rates in these past weeks. Was it safer than this in the past >> >> to warrant the current high score in spamassassin-3.2.5? >> >> >> >> Warren Togami >> >> wtog...@redhat.com >> > >> > Is it not a bit flawed to do the metrics on volunteer submissions, given >> > the Spamhaus has is said to have a small army of them? It means the data >> > cannot be relied upon as any kind of sensible comparison. >> >> please explain. How would you suggest measuring false positives? >> > Do you think that volunteer submissions are an accurate way to do them, > or do you think that is open to abuse? > > For example, say I am Steve Linford with a small army of volunteers. I > get a few false positives come in from Spamhaus, and a few from SORBS. > What is my inclination when I submit the data? > > It takes only a small amount of research and a trawl through the NANAE > archives to get a handle on the problem, and the general abuse and > nefarious goings on with DNSBL volunteers. It is fair to say that there > is not much love lost. > > I'm not pretending I have the answers, so it's probably better to take > these lists with a large bucket of salt and find how any given DNSBL > list works for a given organisation. > > In a world where presidents and world leaders in America, Zimbabwe and > Afghanistan get 'elected' on tainted data, some random RBL 'comparison' > list is a trivial by comparison. It must, however, be duly remembered > that there are many competing 'sides' in the world of the DNSBL's, each > looking to do the other discredit. > > Perhaps Jim, as you posed the question - you have some strong feelings > on the matter that you would like to share? > > -- --j.