From: "Michael Monnerie" <[EMAIL PROTECTED]>
Jane made a good statement about writing rules to make a peak around 5.0, to clearly indicate SPAM or HAM. Sounds reasonable, but I didn't test it, because I don't happen to have any FPs.
Actually it's Joanne not Jane. {^_-} And the point I made is to keep the region right around 5.0 as swept clean of ambiguous cases as it's possible to maintain. It MAY be that the reliability of a rule should govern its score upon use. And scores should have a sprinkling of negative scores as well as mostly positive scores. It seems like Kalman filter approaches might do some real good. In fact a REAL Kalman filter that trains on feedback the way Bayes trains on feedback might produce some really interesting results as well as weed out rules that seem to amount to little or nothing at the present time. There was somebody here who did discuss a dynamic scoring engine approach. I wonder how far he got with it. His initial report sounded quite promising. And it's an ideal setting for Kalman sort of techniques. "This rule is good for condition A, C, and D but not B..." I do really like the idea of creating a dead zone that has neither ham or spam in it right around a score of 5 with separate peaks for ham and spam on either side of that empty zone. It may be hard to force that kind of selection without some fancy processing, though. {^_^}