Hi all,

Quite a lot of my mail is scoring so near 50% (0.49999 - 0.50001) that it 
doesn't show any Bayes score in the summary.  This is mostly just an 
inconvenience, but I wonder if there is a "pull" towards the middle that makes 
this happen.  I am seeing it in about 2% of my email, much more than I would 
expect statistically.

I run auto-learn with thresholds of -0.1 and 12 for ham and spam, respectively, 
and hand train only what falls on the wrong side of the threshold, mostly 
"sham" and mailing list stuff.  I should add that Bayes is performing 
beautifully on the whole, giving BAYES_99 scores to much of the spam and 
BAYES_00 to much of the ham.

Here's the debug of today's Congressional Quarterly update:

debug: bayes token 'norton' => 0.999707779886148
debug: bayes token 'botulism' => 0.999705919796308
debug: bayes token 'Edited' => 0.000346872985170858
debug: bayes token 'codify' => 0.999652892561984
debug: bayes token 'Bush's' => 0.00038401142041399
debug: bayes token 'trillion' => 0.999423220973783
debug: bayes token 'Sept' => 0.000895174708818636
debug: bayes token 'biology' => 0.998604229607251
debug: bayes token 'nod' => 0.998514469453376
debug: bayes token 'H*c:ISO-8859-1' => 0.998355871886121
debug: bayes token 'N:H*c:ISO-NNNN-N' => 0.998355871886121
debug: bayes token 'entrepreneur' => 0.998159362549801
debug: bayes token 'presidential' => 0.997810426540284
debug: bayes token 'sanchez' => 0.997581151832461
debug: bayes token 'Sen' => 0.00243438914027149
debug: bayes token 'toxin' => 0.997447513812155
debug: bayes token 'veto' => 0.997298245614035
debug: bayes token 'heightened' => 0.997298245614035
debug: bayes token 'thrown' => 0.997298245614035
debug: bayes token 'sk:johannh' => 0.00281675392670157
debug: bayes token 'widespread' => 0.996940397350993
debug: bayes token 'IRAQ' => 0.0033416149068323
debug: bayes token 'manslaughter' => 0.996473282442748
debug: bayes token 'Armed' => 0.00381560283687943
debug: bayes token 'predecessor' => 0.996181818181818
debug: bayes token 'POLITICS' => 0.00410687022900763
debug: bayes token 'lawmakers' => 0.00410687022900763
debug: bayes token 'rostrum' => 0.995837837837838
debug: bayes token 'U*johannh' => 0.00444628099173554
debug: bayes token 'Legislative' => 0.00484684684684685
debug: bayes token 'expire' => 0.994923076923077
debug: bayes token 'conviction' => 0.994296296296296
debug: bayes token 'vacant' => 0.994296296296296
debug: bayes token 'Hill' => 0.00578629790239589
debug: bayes token 'WHITE' => 0.00591208791208791
debug: bayes token 'smallpox' => 0.993492957746479
debug: bayes token 'cuts' => 0.992426229508197
debug: bayes token 'slog' => 0.992426229508197
debug: bayes token 'Rayburn' => 0.992426229508197
debug: bayes token 'inspector' => 0.992426229508197
debug: bayes token 'LEAD' => 0.0075774647887324
debug: bayes token 'Ghraib' => 0.0075774647887324
debug: bayes token 'sk:bruderh' => 0.0077612241482167
debug: bayes token 'Associate' => 0.990941176470588
debug: bayes token 'Labor' => 0.990941176470588
debug: bayes token 'awareness' => 0.990941176470588
debug: bayes token 'lisa' => 0.990941176470588
debug: bayes token 'amendments' => 0.990941176470588
debug: bayes token 'HPrecedence:bulk' => 0.0100743380045442
debug: bayes token 'meanwhile' => 0.0103799014049242
debug: bayes token 'POLITICAL' => 0.0105490196078431
debug: bayes token 'authorize' => 0.0105490196078431
debug: bayes token 'transcripts' => 0.0109482190823829
debug: bayes token 'Sioux' => 0.988731707317073
debug: bayes token 'reconciliation' => 0.0121050914533672
debug: bayes token 'interrogation' => 0.0131219512195122
debug: bayes token 'Aug' => 0.0131219512195122
debug: bayes token 'Chuck' => 0.0134463470755773
debug: bayes token 'unsub.php' => 0.986289792995557
debug: bayes token 'operations' => 0.985096774193548
debug: bayes token '$10' => 0.985096774193548
debug: bayes token 'POLICY' => 0.985096774193548
debug: bayes token 'encompass' => 0.985096774193548
debug: bayes token 'Oregon' => 0.985096774193548
debug: bayes token 'jay' => 0.985096774193548
debug: bayes token 'Comprehensive' => 0.985096774193548
debug: bayes token 'Republicans' => 0.985096774193548
debug: bayes token 'requiring' => 0.985096774193548
debug: bayes token 'Tuesday' => 0.0152914450802208
debug: bayes token 'Afghanistan' => 0.0155373854267987
debug: bayes token 'BUSH' => 0.015665955109187
debug: bayes token 'Stephanie' => 0.0169977759969479
debug: bayes token 'articles' => 0.0170062914690695
debug: bayes token 'contention' => 0.982831618107102
debug: bayes token 'humiliation' => 0.0173548387096774
debug: bayes token 'Rep' => 0.0173548387096774
debug: bayes token '4200' => 0.0173548387096774
debug: bayes token 'Angle' => 0.0173548387096774
debug: bayes token 'Politics' => 0.0187579456346197
debug: bayes token 'Loren' => 0.019482267571591
debug: bayes token 'PRODUCTS' => 0.978
debug: bayes token 'witnesses' => 0.978
debug: bayes token 'schedules' => 0.978
debug: bayes token 'gov' => 0.978
debug: bayes token 'Ricardo' => 0.978
debug: bayes token 'award-winning' => 0.978
debug: bayes token 'insisted' => 0.978
debug: bayes token 'biotech' => 0.978
debug: bayes token 'uncovered' => 0.978
debug: bayes token 'closures' => 0.978
debug: bayes token 'profiling' => 0.978
debug: bayes token 'frank' => 0.977757641063746
debug: bayes token 'Published' => 0.0228176647491692
debug: bayes token 'Abu' => 0.0228176647491692
debug: bayes token 'Action' => 0.0237349212528429
debug: bayes token 'TODAY'S' => 0.0241620068249791
debug: bayes token 'Deputy' => 0.0256190476190476
debug: bayes token 'authorization' => 0.0256190476190476
debug: bayes token 'ABUSE' => 0.0256190476190476
debug: bayes token 'Geoffrey' => 0.0256190476190476
debug: bayes token 'Quarterly' => 0.0256190476190476
debug: bayes token 'BILL' => 0.0256190476190476
debug: bayes token 'Gov' => 0.0257577769901189
debug: bayes token 'plague' => 0.973548163148501
debug: bayes token 'discretionary' => 0.971950991292676
debug: bayes token 'attending' => 0.03030935155937
debug: bayes token 'Martha' => 0.0329079715403525
debug: bayes token 'Diane' => 0.0372427540311863
debug: bayes token 'miller' => 0.962144772742113
debug: bayes token 'March' => 0.0380388185583064
debug: bayes token 'sexual' => 0.958104408601979
debug: bayes token 'hearings' => 0.0419968615545856
debug: bayes token 'ongoing' => 0.958
debug: bayes token 'BUDGET' => 0.958
debug: bayes token 'Capitol' => 0.958
debug: bayes token 'commands' => 0.958
debug: bayes token 'BASE' => 0.958
debug: bayes token 'VOTE' => 0.958
debug: bayes token 'hazardous' => 0.958
debug: bayes token 'honors' => 0.958
debug: bayes token 'BOOST' => 0.958
debug: bayes token 'demonstrations' => 0.958
debug: bayes token 'UD:ORG' => 0.958
debug: bayes token 'UPDATE' => 0.958
debug: bayes token 'Congressional' => 0.0442176590202618
debug: bayes token 'PDF' => 0.0443248289981357
debug: bayes token 'hoping' => 0.045210507247701
debug: bayes token 'cleared' => 0.954614368617517
debug: bayes token 'David' => 0.04564261363836
debug: bayes token 'STORY' => 0.0470387206646425
debug: bayes token 'threaten' => 0.951807967300554
debug: bayes token 'Republican' => 0.0483442976731596
debug: bayes token 'H*c:plain' => 0.0484269986413193
debug: bayes token 'Affairs' => 0.0486078727290241
debug: bayes token 'closings' => 0.0489090909090909
debug: bayes token 'DEFENSE' => 0.0489090909090909
debug: bayes token 'Sons' => 0.0489090909090909
debug: bayes token 'Army' => 0.0489090909090909
debug: bayes token '535' => 0.0489090909090909
debug: bayes token 'voters' => 0.0489090909090909
debug: bayes token 'GOP' => 0.0489090909090909
debug: bayes token 'highlighting' => 0.0489090909090909
debug: bayes token 'attends' => 0.0489090909090909
debug: bayes token 'cancels' => 0.0489090909090909
debug: bayes token 'briefings' => 0.0489090909090909
debug: bayes token 'interrogations' => 0.0489090909090909
debug: bayes token 'rely' => 0.0489090909090909
debug: bayes token 'detainee' => 0.0489090909090909
debug: bayes token 'LEADER' => 0.0489090909090909
debug: bayes token 'Cabinet' => 0.0489090909090909
debug: bayes: score = 0.5

I'm running 2.63 with a Bayes DB like this:

0.000          0          2          0  non-token data: bayes db version
0.000          0     114942          0  non-token data: nspam
0.000          0      39789          0  non-token data: nham
0.000          0     138554          0  non-token data: ntokens
0.000          0 1084813996          0  non-token data: oldest atime
0.000          0 1084991383          0  non-token data: newest atime
0.000          0 1084990603          0  non-token data: last journal sync atime
0.000          0 1084986837          0  non-token data: last expiry atime
0.000          0     172800          0  non-token data: last expire atime delta
0.000          0      13664          0  non-token data: last expire reduction 
count

Is anyone else noticing this?

Pierre Thomson
BIC

Reply via email to