On Sat, 2009-05-23 at 11:26 -0500, Larry Nedry wrote:
> On 5/22/09 at 9:28 PM +0200 Karsten Bräckelmann wrote:
> >An interesting observation is, that the hitrate (in percent) in spam
> >scoring < 15 is an order of magnitude higher than with high-scoring [1]
> >spam. This is rare to find...
> 
> My EMAILBL_TEST_LEM hitrate leans heavily toward the other end of the
> spectrum with almost 88% scoring > 15.  My data is based on a little more
> than 100,000 emails.

Wait, you're looking at the hits differently than I did.

> Stats for only messages tagged with EMAILBL_TEST_LEM:
> 
> 04.5% scored 00.0 - 05.0
> 03.0% scored 05.0 - 10.0
> 04.5% scored 10.0 - 15.0
> 09.1% scored 15.0 - 20.0
> 78.8% scored 20.0 or higher

That's limited to EmailBL hits, so the total of these hits equal 100%.
For me that would have been:

  19.4%  of mail hitting EmailBL has a score < 15
  80.6%  of mail hitting EmailBL has a score > 15

However, a score > 15 is more than 98.5% of my spam. Taking that into
account, the numbers change drastically. That's what I reported. Less
than 1% hits in ALL spam with a total score of 15 or higher.

Yet, 10.9% hits in ALL spam with a score less than 15.

And that's what counts in my book. I don't care if the lions share of
EmailBL hits are actually high scorers. Those don't need a boost anyway.
What I do care about are hits in the sneaky-ish crap. And that's where
it hits on more than 10%.


Larry, what numbers do you get, if you count hits in ALL your spam
in-stream, broken down by scores?

  guenther


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to