Theo Van Dinter wrote on Sun, 29 Aug 2004 13:04:35 -0400:
> I'm not sure where he came up with that idea actually.
>
Yeah, with later tests I saw that the header and body tests *must* be taking
custom rules into account.
I based my assumption on earlier tests today, I probably misinterpreted them.
Anyway, here's a typical
example where I'm not sure why *exactly* it did not get autolearned:
debug: auto-learn: message score: 26.353, computed score for autolearn: 23.74
debug: auto-learn? ham=0.1, spam=8, body-points=14.687, head-points=2.2,
learned-points=1.886
Ok, that's clear, too few head-points, but "where" did I "loose" them? Below's
a breakdown of the hits.
Doesn't it look on first and second glance like it should have been
autolearned? Only digging deep into
it shows some possible reasons. Let's see:
There is an overall score of 26 - body score of 14 = 12 - head points = 10 -
BAYES_99 = 8 (roughly). So,
around 8 score points didn't count. Which rules did they come from? There's
also a recomputed score
which is used for autolearn, that's only 24 (possibly minus bayes). So I'm
still missing 8 score points
which didn't account for header or body. (Actually, it might be interesting to
use the SARE_bayes-poison
rules as a basis for NOT autolearning, but that's obviously not the case here.)
I assume the missing points belong to those rules with "noautolearn" etc. or to
rules which are neither
header nor body. But how to determine? F.i. LONGWORDS should count as a body
test (no noautolearn from
what I can see), but it is not identified as a BODY test in the list. I don't
know if it counted or not.
On the other hand RATWARE_ZERO_TZ is a header test, but wasn't used. Looking
further I see that it's
actually a meta test based on header tests and LONGWORDS is a meta test as
well, but based on body
tests. Looks like I found the answer: meta tests don't count at all? I'd rather
count them. As you see,
it deducted the necessary score points to get this message autolearned although
it scored a whopping
overall of 26. I suppose the reason for not using meta tests at all is that it
would need even more
processing to determine the nature from the subtests and there are also meta
tests which are not clearly
body or header tests. The problem is there are many meta tests and they all
don't count as it seems.
* 1.0 S_FREE_6 S_FREE_6
* 1.2 HTML_MESSAGE HTML message
* 0.1 TW_JS BODY: Odd Letter Triples with JS
* 0.1 TW_ZF BODY: Odd Letter Triples with ZF
* 0.6 J_CHICKENPOX_42 BODY: 4alpha-pock-2alpha
* 0.1 TW_UW BODY: Odd Letter Triples with UW
* 0.1 TW_XV BODY: Odd Letter Triples with XV
* 0.1 TW_FG BODY: Odd Letter Triples with FG
* 2.0 SPAM_BUY_8 BODY: SPAM_BUY_8
* 0.1 TW_HW BODY: Odd Letter Triples with HW
* 0.1 TW_FH BODY: Odd Letter Triples with FH
* 0.1 TW_VT BODY: Odd Letter Triples with VT
* 0.1 TW_QL BODY: Odd Letter Triples with QL
* 0.1 TW_YY BODY: Odd Letter Triples with YY
* 0.1 TW_UQ BODY: Odd Letter Triples with UQ
* 3.1 STRONG_BUY BODY: Tells you about a strong buy
* 0.1 TW_QH BODY: Odd Letter Triples with QH
* 0.1 TW_TC BODY: Odd Letter Triples with TC
* 0.1 TW_DP BODY: Odd Letter Triples with DP
* 0.1 TW_VQ BODY: Odd Letter Triples with VQ
* 0.1 TW_HJ BODY: Odd Letter Triples with HJ
* 0.8 SARE_BAYES_7x5 BODY: Bayes poison 7x5
* 0.1 TW_CB BODY: Odd Letter Triples with CB
* 2.7 NOT_ADVISOR BODY: Not registered investment advisor
* 1.7 SARE_FWDLOOK BODY: Forward looking statements about stocks
* 0.1 TW_YD BODY: Odd Letter Triples with YD
* 0.8 SARE_BAYES_8x5 BODY: Bayes poison 8x5
* 0.1 TW_IU BODY: Odd Letter Triples with IU
* 0.1 TW_YQ BODY: Odd Letter Triples with YQ
* 0.2 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
* 1.9 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
* [score: 1.0000]
* 0.2 HTML_10_20 BODY: Message is 10% to 20% HTML
* 0.0 MIME_QP_LONG_LINE RAW: Quoted-printable line longer than 76 chars
* 2.3 LONGWORDS Long string of long words
* 4.1 RATWARE_ZERO_TZ Bulk email fingerprint (+0000) found
* 2.2 SARE_MULT_RATW_02 Spammer sign in headers
Kai
--
Kai Sch�tzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org