At 10:55 AM 9/29/2004, Chris Santerre wrote:
What was the reason WS got such a low score in SA 3.0??? .5 is a joke! Hell
BigEvil was scored a 3 and now one complained, and it is the same data!! I
don't understand. Did the mass check not go well?

Upon closer inspection, the WS mass-check went pretty well, but WS had the greatest number of nonspam hits of all the SURBL lists. It also hit the most spam, but the OB list hit nearly as much spam, and almost no nonspam.


Since the GA treats FP's as 100 times worse than FNs, the GA is going to heavily bias the score of any overlapping spam hits to the one that has the least nonspam hits. I suspect that in the spam cases, most of the WS hits also hit either OB or SC, which have better FP ratios, and the scores assigned reflect this.

Admittedly the amount of nonspam WS hit is small (0.4%), but that's over 6 times more nonspam than OB did, and 100 times more than SC did.

Thus WS got a lowish score not for being a bad rule, but for not doing as well as it's neighbors that catch the same spams.

From STATISTICS-set1.txt
OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
 10.497  15.8904   0.0008    1.000   0.98    2.01  URIBL_AB_SURBL
 18.019  27.2741   0.0046    1.000   0.97    3.90  URIBL_SC_SURBL
 49.029  74.1861   0.0654    0.999   0.74    2.00  URIBL_OB_SURBL
 51.999  78.4712   0.4756    0.994   0.45    0.54  URIBL_WS_SURBL
  0.010   0.0146   0.0012    0.927   0.39    0.84  URIBL_PH_SURBL

From STATISTICS-set3.txt:
OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
  7.022  14.4233   0.0061    1.000   0.95    4.26  URIBL_SC_SURBL
 30.471  62.5514   0.0632    0.999   0.74    3.21  URIBL_OB_SURBL
  2.950   6.0208   0.0385    0.994   0.73    0.42  URIBL_AB_SURBL
 33.807  68.9994   0.4494    0.994   0.47    1.46  URIBL_WS_SURBL
  0.019   0.0390   0.0008    0.981   0.44    2.00  URIBL_PH_SURBL

grep SURBL 50_scores.cf:
score URIBL_AB_SURBL 0 2.007 0 0.417
score URIBL_OB_SURBL 0 1.996 0 3.213
score URIBL_PH_SURBL 0 0.839 0 2.000
score URIBL_SC_SURBL 0 3.897 0 4.263
score URIBL_WS_SURBL 0 0.539 0 1.462





Reply via email to