jdow wrote:

And the point I made is to keep the region right around 5.0 as swept
clean of ambiguous cases as it's possible to maintain. It MAY be that
the reliability of a rule should govern its score upon use. And scores
should have a sprinkling of negative scores as well as mostly positive
scores. It seems like Kalman filter approaches might do some real good.

What about replacing Bayes with Support Vector Machines? anyone played with this?

In fact a REAL Kalman filter that trains on feedback the way Bayes
trains on feedback might produce some really interesting results as
well as weed out rules that seem to amount to little or nothing at
the present time. There was somebody here who did discuss a dynamic
scoring engine approach. I wonder how far he got with it. His initial
report sounded quite promising. And it's an ideal setting for Kalman
sort of techniques. "This rule is good for condition A, C, and D but
not B..."

I do really like the idea of creating a dead zone that has neither
ham or spam in it right around a score of 5 with separate peaks for
ham and spam on either side of that empty zone. It may be hard to
force that kind of selection without some fancy processing, though.
Why not use two different filters:
- SA (without Bayes nor AWL)
- an adaptive filter (bogofilter has "unsure" zones)
and take the decision based on either or both, depending on the configuration of each. with a conservative setup of both, you can decide it's spam if either filter says it is (you'll get more FNs, but few FPs). with an aggressive setup, you can use AND. with other setups, you can do more complex decisions. An advantage of this is that you can split this as a site-wide filter (SA) and a per-user filter.

Reply via email to