Hi all, Quite a lot of my mail is scoring so near 50% (0.49999 - 0.50001) that it doesn't show any Bayes score in the summary. This is mostly just an inconvenience, but I wonder if there is a "pull" towards the middle that makes this happen. I am seeing it in about 2% of my email, much more than I would expect statistically.
I run auto-learn with thresholds of -0.1 and 12 for ham and spam, respectively, and hand train only what falls on the wrong side of the threshold, mostly "sham" and mailing list stuff. I should add that Bayes is performing beautifully on the whole, giving BAYES_99 scores to much of the spam and BAYES_00 to much of the ham. Here's the debug of today's Congressional Quarterly update: debug: bayes token 'norton' => 0.999707779886148 debug: bayes token 'botulism' => 0.999705919796308 debug: bayes token 'Edited' => 0.000346872985170858 debug: bayes token 'codify' => 0.999652892561984 debug: bayes token 'Bush's' => 0.00038401142041399 debug: bayes token 'trillion' => 0.999423220973783 debug: bayes token 'Sept' => 0.000895174708818636 debug: bayes token 'biology' => 0.998604229607251 debug: bayes token 'nod' => 0.998514469453376 debug: bayes token 'H*c:ISO-8859-1' => 0.998355871886121 debug: bayes token 'N:H*c:ISO-NNNN-N' => 0.998355871886121 debug: bayes token 'entrepreneur' => 0.998159362549801 debug: bayes token 'presidential' => 0.997810426540284 debug: bayes token 'sanchez' => 0.997581151832461 debug: bayes token 'Sen' => 0.00243438914027149 debug: bayes token 'toxin' => 0.997447513812155 debug: bayes token 'veto' => 0.997298245614035 debug: bayes token 'heightened' => 0.997298245614035 debug: bayes token 'thrown' => 0.997298245614035 debug: bayes token 'sk:johannh' => 0.00281675392670157 debug: bayes token 'widespread' => 0.996940397350993 debug: bayes token 'IRAQ' => 0.0033416149068323 debug: bayes token 'manslaughter' => 0.996473282442748 debug: bayes token 'Armed' => 0.00381560283687943 debug: bayes token 'predecessor' => 0.996181818181818 debug: bayes token 'POLITICS' => 0.00410687022900763 debug: bayes token 'lawmakers' => 0.00410687022900763 debug: bayes token 'rostrum' => 0.995837837837838 debug: bayes token 'U*johannh' => 0.00444628099173554 debug: bayes token 'Legislative' => 0.00484684684684685 debug: bayes token 'expire' => 0.994923076923077 debug: bayes token 'conviction' => 0.994296296296296 debug: bayes token 'vacant' => 0.994296296296296 debug: bayes token 'Hill' => 0.00578629790239589 debug: bayes token 'WHITE' => 0.00591208791208791 debug: bayes token 'smallpox' => 0.993492957746479 debug: bayes token 'cuts' => 0.992426229508197 debug: bayes token 'slog' => 0.992426229508197 debug: bayes token 'Rayburn' => 0.992426229508197 debug: bayes token 'inspector' => 0.992426229508197 debug: bayes token 'LEAD' => 0.0075774647887324 debug: bayes token 'Ghraib' => 0.0075774647887324 debug: bayes token 'sk:bruderh' => 0.0077612241482167 debug: bayes token 'Associate' => 0.990941176470588 debug: bayes token 'Labor' => 0.990941176470588 debug: bayes token 'awareness' => 0.990941176470588 debug: bayes token 'lisa' => 0.990941176470588 debug: bayes token 'amendments' => 0.990941176470588 debug: bayes token 'HPrecedence:bulk' => 0.0100743380045442 debug: bayes token 'meanwhile' => 0.0103799014049242 debug: bayes token 'POLITICAL' => 0.0105490196078431 debug: bayes token 'authorize' => 0.0105490196078431 debug: bayes token 'transcripts' => 0.0109482190823829 debug: bayes token 'Sioux' => 0.988731707317073 debug: bayes token 'reconciliation' => 0.0121050914533672 debug: bayes token 'interrogation' => 0.0131219512195122 debug: bayes token 'Aug' => 0.0131219512195122 debug: bayes token 'Chuck' => 0.0134463470755773 debug: bayes token 'unsub.php' => 0.986289792995557 debug: bayes token 'operations' => 0.985096774193548 debug: bayes token '$10' => 0.985096774193548 debug: bayes token 'POLICY' => 0.985096774193548 debug: bayes token 'encompass' => 0.985096774193548 debug: bayes token 'Oregon' => 0.985096774193548 debug: bayes token 'jay' => 0.985096774193548 debug: bayes token 'Comprehensive' => 0.985096774193548 debug: bayes token 'Republicans' => 0.985096774193548 debug: bayes token 'requiring' => 0.985096774193548 debug: bayes token 'Tuesday' => 0.0152914450802208 debug: bayes token 'Afghanistan' => 0.0155373854267987 debug: bayes token 'BUSH' => 0.015665955109187 debug: bayes token 'Stephanie' => 0.0169977759969479 debug: bayes token 'articles' => 0.0170062914690695 debug: bayes token 'contention' => 0.982831618107102 debug: bayes token 'humiliation' => 0.0173548387096774 debug: bayes token 'Rep' => 0.0173548387096774 debug: bayes token '4200' => 0.0173548387096774 debug: bayes token 'Angle' => 0.0173548387096774 debug: bayes token 'Politics' => 0.0187579456346197 debug: bayes token 'Loren' => 0.019482267571591 debug: bayes token 'PRODUCTS' => 0.978 debug: bayes token 'witnesses' => 0.978 debug: bayes token 'schedules' => 0.978 debug: bayes token 'gov' => 0.978 debug: bayes token 'Ricardo' => 0.978 debug: bayes token 'award-winning' => 0.978 debug: bayes token 'insisted' => 0.978 debug: bayes token 'biotech' => 0.978 debug: bayes token 'uncovered' => 0.978 debug: bayes token 'closures' => 0.978 debug: bayes token 'profiling' => 0.978 debug: bayes token 'frank' => 0.977757641063746 debug: bayes token 'Published' => 0.0228176647491692 debug: bayes token 'Abu' => 0.0228176647491692 debug: bayes token 'Action' => 0.0237349212528429 debug: bayes token 'TODAY'S' => 0.0241620068249791 debug: bayes token 'Deputy' => 0.0256190476190476 debug: bayes token 'authorization' => 0.0256190476190476 debug: bayes token 'ABUSE' => 0.0256190476190476 debug: bayes token 'Geoffrey' => 0.0256190476190476 debug: bayes token 'Quarterly' => 0.0256190476190476 debug: bayes token 'BILL' => 0.0256190476190476 debug: bayes token 'Gov' => 0.0257577769901189 debug: bayes token 'plague' => 0.973548163148501 debug: bayes token 'discretionary' => 0.971950991292676 debug: bayes token 'attending' => 0.03030935155937 debug: bayes token 'Martha' => 0.0329079715403525 debug: bayes token 'Diane' => 0.0372427540311863 debug: bayes token 'miller' => 0.962144772742113 debug: bayes token 'March' => 0.0380388185583064 debug: bayes token 'sexual' => 0.958104408601979 debug: bayes token 'hearings' => 0.0419968615545856 debug: bayes token 'ongoing' => 0.958 debug: bayes token 'BUDGET' => 0.958 debug: bayes token 'Capitol' => 0.958 debug: bayes token 'commands' => 0.958 debug: bayes token 'BASE' => 0.958 debug: bayes token 'VOTE' => 0.958 debug: bayes token 'hazardous' => 0.958 debug: bayes token 'honors' => 0.958 debug: bayes token 'BOOST' => 0.958 debug: bayes token 'demonstrations' => 0.958 debug: bayes token 'UD:ORG' => 0.958 debug: bayes token 'UPDATE' => 0.958 debug: bayes token 'Congressional' => 0.0442176590202618 debug: bayes token 'PDF' => 0.0443248289981357 debug: bayes token 'hoping' => 0.045210507247701 debug: bayes token 'cleared' => 0.954614368617517 debug: bayes token 'David' => 0.04564261363836 debug: bayes token 'STORY' => 0.0470387206646425 debug: bayes token 'threaten' => 0.951807967300554 debug: bayes token 'Republican' => 0.0483442976731596 debug: bayes token 'H*c:plain' => 0.0484269986413193 debug: bayes token 'Affairs' => 0.0486078727290241 debug: bayes token 'closings' => 0.0489090909090909 debug: bayes token 'DEFENSE' => 0.0489090909090909 debug: bayes token 'Sons' => 0.0489090909090909 debug: bayes token 'Army' => 0.0489090909090909 debug: bayes token '535' => 0.0489090909090909 debug: bayes token 'voters' => 0.0489090909090909 debug: bayes token 'GOP' => 0.0489090909090909 debug: bayes token 'highlighting' => 0.0489090909090909 debug: bayes token 'attends' => 0.0489090909090909 debug: bayes token 'cancels' => 0.0489090909090909 debug: bayes token 'briefings' => 0.0489090909090909 debug: bayes token 'interrogations' => 0.0489090909090909 debug: bayes token 'rely' => 0.0489090909090909 debug: bayes token 'detainee' => 0.0489090909090909 debug: bayes token 'LEADER' => 0.0489090909090909 debug: bayes token 'Cabinet' => 0.0489090909090909 debug: bayes: score = 0.5 I'm running 2.63 with a Bayes DB like this: 0.000 0 2 0 non-token data: bayes db version 0.000 0 114942 0 non-token data: nspam 0.000 0 39789 0 non-token data: nham 0.000 0 138554 0 non-token data: ntokens 0.000 0 1084813996 0 non-token data: oldest atime 0.000 0 1084991383 0 non-token data: newest atime 0.000 0 1084990603 0 non-token data: last journal sync atime 0.000 0 1084986837 0 non-token data: last expiry atime 0.000 0 172800 0 non-token data: last expire atime delta 0.000 0 13664 0 non-token data: last expire reduction count Is anyone else noticing this? Pierre Thomson BIC
