On Sun, Nov 15, 2009 at 08:53, rich...@buzzhost.co.uk <rich...@buzzhost.co.uk> wrote: > On Sun, 2009-11-15 at 03:14 -0500, Warren Togami wrote: >> http://mail-archives.apache.org/mod_mbox/spamassassin-users/200910.mbox/%3c4ad11c44.9030...@redhat.com%3e >> Compare this report to a similar report last month. >> >> http://wiki.apache.org/spamassassin/NightlyMassCheck >> The results below are only as good as the data submitted by nightly >> masscheck volunteers. Please join us in nightly masschecks to increase >> the sample size of the corpora so we can have greater confidence in >> the nightly statistics. >> >> http://ruleqa.spamassassin.org/20091114-r836144-n >> Spam 131399 messages from 18 users >> Ham 189948 messages from 18 users >> >> ============================ >> DNSBL lastexternal by Safety >> ============================ >> SPAM% HAM% RANK RULE >> 12.8342% 0.0021% 0.94 RCVD_IN_PSBL * >> 12.3053% 0.0026% 0.94 RCVD_IN_XBL >> 31.2499% 0.0827% 0.87 RCVD_IN_ANBREP_BL *2 >> 80.2578% 0.1485% 0.86 RCVD_IN_PBL >> 27.1836% 0.1985% 0.79 RCVD_IN_SORBS_DUL >> 19.8213% 0.1785% 0.79 RCVD_IN_SEMBLACK * >> 90.9360% 0.3854% 0.77 RCVD_IN_BRBL_LASTEXT >> 13.0564% 0.4838% 0.67 RCVD_IN_HOSTKARMA_BL * >> >> Commentary: >> * PSBL and XBL lead in apparent safety. >> * ANBREP was added after the October report and has made a surprisingly >> strong showing in this past month. ANBREP is currently unavailable to >> the general public. The list owner is thinking about going public with >> the list, which I would encourage because they are clearly doing >> something right. It seems he would need a global network of automated >> mirrors to be able to scale. He would also need listing/delisting >> policy clearly stated on a web page somewhere. >> * SEMBLACK consistently has been performing adequately in safety while >> catching a respectable amount of spam. I personally use this >> non-default blacklist. >> * It is clear that the two main blacklists are Spamhaus and BRBL. The >> Zen combinatoin of Spamhaus zones is extremely effective and generally >> safe. BRBL has a high hit rate as well, with a moderate safety rating. >> * HOSTKARMA_BL ranks dead last in safety for the past several weeks in a >> row, while not being more effective against spam than PSBL, XBL or SEMBLACK. >> >> =============================== >> HOSTKARMA_BL much better as URIBL >> =============================== >> SPAM% HAM% RANK RULE >> 68.3651% 0.2806% 0.79 URIBL_HOSTKARMA_BL * >> >> Commentary: >> While HOSTKARMA_BL is pretty unsafe as a plain DNSBL, it is surprisingly >> effective as a URIBL. This is curious as it seems it was not designed >> to be used as a URIBL. In any case as long our masschecks show good >> statistics like this, I will personally use this on my own spamassassin >> server. >> >> ========================= >> SPAMCOP Dangerous? >> ========================= >> SPAM% HAM% RANK RULE >> 17.4225% 2.6076% 0.56 RCVD_IN_BL_SPAMCOP_NET * >> >> Commentary: >> Is Spamcop seriously this bad? It consistently has shown a high false >> positive rates in these past weeks. Was it safer than this in the past >> to warrant the current high score in spamassassin-3.2.5? >> >> Warren Togami >> wtog...@redhat.com > > Is it not a bit flawed to do the metrics on volunteer submissions, given > the Spamhaus has is said to have a small army of them? It means the data > cannot be relied upon as any kind of sensible comparison.
please explain. How would you suggest measuring false positives? -- --j.