Daniel Quinlan <[EMAIL PROTECTED]> writes:

> We can always just test it...

Okay, I tested it on my last 7 days of spam and ham (which I just
generated today).

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
   4895     2868     2027    0.586   0.00    0.00  (all messages)
100.000  58.5904  41.4096    0.586   0.00    0.00  (all messages as %)
 42.451  71.0948   1.9240    0.974   1.00    1.00  URIBL_SBL
  0.204   0.3487   0.0000    1.000   0.97    0.01  T_URIBL_SC_SURBL
  0.756   0.9763   0.4440    0.687   0.22    1.00  URIBL_DSBL

No FPs, but the SPAM% is rather low.  I suspect the problem is that
SURBL is a direct listing of URIs whereas URIBL does the NS->A->RBL
mapping.

Also, my hits were largely confined to the last 4 days as expected
despite the corpus including the last 7 days of my spam:

  first message in corpus: Fri Mar 19 23:11:07 2004
  last message in corpus: Sun Mar 28 05:16:17 2004

  hits:

    Sun Mar 21 10:15:04 2004
    Sun Mar 21 11:16:25 2004
    Wed Mar 24 15:06:53 2004
    Thu Mar 25 12:30:52 2004
    Thu Mar 25 23:56:50 2004
    Fri Mar 26 01:42:13 2004
    Fri Mar 26 01:59:56 2004
    Fri Mar 26 03:45:22 2004
    Fri Mar 26 08:28:00 2004
    Sat Mar 27 05:57:20 2004

  distribution of messages in corpus:

    count   received date
    23      Mar 19
    360     Mar 20
    335     Mar 21
    369     Mar 22
    324     Mar 23
    372     Mar 24
    390     Mar 25
    398     Mar 26
    295     Mar 27
    2       Mar 28

This may or may not help with accuracy, but definitely will make delayed
testing harder.

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

Reply via email to