On Sat, 2009-05-23 at 11:26 -0500, Larry Nedry wrote: > On 5/22/09 at 9:28 PM +0200 Karsten Bräckelmann wrote: > >An interesting observation is, that the hitrate (in percent) in spam > >scoring < 15 is an order of magnitude higher than with high-scoring [1] > >spam. This is rare to find... > > My EMAILBL_TEST_LEM hitrate leans heavily toward the other end of the > spectrum with almost 88% scoring > 15. My data is based on a little more > than 100,000 emails.
Wait, you're looking at the hits differently than I did. > Stats for only messages tagged with EMAILBL_TEST_LEM: > > 04.5% scored 00.0 - 05.0 > 03.0% scored 05.0 - 10.0 > 04.5% scored 10.0 - 15.0 > 09.1% scored 15.0 - 20.0 > 78.8% scored 20.0 or higher That's limited to EmailBL hits, so the total of these hits equal 100%. For me that would have been: 19.4% of mail hitting EmailBL has a score < 15 80.6% of mail hitting EmailBL has a score > 15 However, a score > 15 is more than 98.5% of my spam. Taking that into account, the numbers change drastically. That's what I reported. Less than 1% hits in ALL spam with a total score of 15 or higher. Yet, 10.9% hits in ALL spam with a score less than 15. And that's what counts in my book. I don't care if the lions share of EmailBL hits are actually high scorers. Those don't need a boost anyway. What I do care about are hits in the sneaky-ish crap. And that's where it hits on more than 10%. Larry, what numbers do you get, if you count hits in ALL your spam in-stream, broken down by scores? guenther -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}