> > > > > Another Dallas miracle!
> > > >
> > > > Oh? Er, how does it determine if a message was ham or spam?
> > > It "looks like"
> > > > it is rather random based on the reports. BAYES_99 may well
> > > hit on 84.33%
> > > > of spam. But I doubt, given it's score, it hits on 44.53% of ham.
> > >
>
> The code should be right... It uses spamassassin's judgement, ie
>
> "info: spamd: result: Y 20 - BAYES_99,..."
> "info: spamd: result: . -2 - AWL,...."
>
> 44.53% of your ham hit BAYES_99... That gotta tell you something is
> wrong! My bayes hits break down like
>
> # ./sa-stats.pl -f spamdlog -n 500 | grep BAYES
> For spam...
> 10 BAYES_99 15351 4.46% 45.42% 60.57%
> 19 BAYES_50 6443 1.87% 19.06% 25.42%
> 31 BAYES_80 1154 0.34% 3.41% 4.55%
> 32 BAYES_60 1147 0.33% 3.39% 4.53%
> 38 BAYES_95 864 0.25% 2.56% 3.41%
> 102 BAYES_00 187 0.05% 0.55% 0.74%
> 152 BAYES_40 92 0.03% 0.27% 0.36%
> 209 BAYES_20 53 0.02% 0.16% 0.21%
> 228 BAYES_05 44 0.01% 0.13% 0.17%
>
> For ham...
> 2 BAYES_00 6959 15.73% 20.59% 82.32%
> 9 BAYES_50 623 1.41% 1.84% 7.37%
> 20 BAYES_40 296 0.67% 0.88% 3.50%
> 24 BAYES_20 267 0.60% 0.79% 3.16%
> 29 BAYES_05 217 0.49% 0.64% 2.57%
> 73 BAYES_60 51 0.12% 0.15% 0.60%
> 113 BAYES_99 24 0.05% 0.07% 0.28%
> 142 BAYES_80 14 0.03% 0.04% 0.17%
> 280 BAYES_95 2 0.00% 0.01% 0.02%
>
> So, BAYES_99 hits 0.28% of my ham and 60.57% of my spam.
>
>
So from your explanation I should be ignoring the %ofham column in the spam stats and the %ofspam column in ham? Otherwise the stats don't seem to make much sense:
python# ./sa-stats -f maillog.0 -n 500 | grep BAYES
spam rules...
3 BAYES_99 305 3.49 4.99 46.56 5.59
10 BAYES_50 172 1.97 2.81 26.26 3.15
23 BAYES_00 100 1.14 1.64 15.27 1.83
77 BAYES_80 21 0.24 0.34 3.21 0.38
85 BAYES_95 19 0.22 0.31 2.90 0.35
111 BAYES_60 14 0.16 0.23 2.14 0.26
131 BAYES_05 12 0.14 0.20 1.83 0.22
186 BAYES_20 7 0.08 0.11 1.07 0.13
224 BAYES_40 5 0.06 0.08 0.76 0.09
373 SARE_BAYES_5x8 2 0.02 0.03 0.31 0.04
387 SARE_BAYES_6x8 2 0.02 0.03 0.31 0.04
412 SARE_BAYES_7x8 2 0.02 0.03 0.31 0.04
ham rules...
1 BAYES_00 4079 14.05 66.75 622.75 74.76
BAYES_00 hitting 622% of spam???
6 BAYES_50 771 2.65 12.62 117.71 14.13
25 BAYES_40 238 0.82 3.89 36.34 4.36
35 BAYES_20 190 0.65 3.11 29.01 3.48
40 BAYES_05 148 0.51 2.42 22.60 2.71
173 BAYES_60 15 0.05 0.25 2.29 0.27
232 BAYES_80 9 0.03 0.15 1.37 0.16
310 BAYES_95 5 0.02 0.08 0.76 0.09
349 SARE_BAYES_6x6 4 0.01 0.07 0.61 0.07
416 SARE_BAYES_5x8 2 0.01 0.03 0.31 0.04
496 SARE_BAYES_5x7 1 0.00 0.02 0.15 0.02
Andy
- RE: generating rule stats from spamd logs Andy Jezierski
- RE: generating rule stats from spamd logs Dallas L. Engelken
- Re: generating rule stats from spamd logs Chris Thielen
- RE: generating rule stats from spamd logs martin smith
- RE: generating rule stats from spamd logs Dallas L. Engelken
- RE: generating rule stats from spamd logs Andy Jezierski
- Re: generating rule stats from spamd logs Steve Martin
- RE: generating rule stats from spamd logs Dallas L. Engelken
- RE: generating rule stats from spamd logs Matthew Yette
- RE: generating rule stats from spamd logs Matthew Yette
- RE: generating rule stats from spamd logs Matthew Yette