> On Sun, 21 Mar 2004 12:27:17 -0800 (PST), [EMAIL PROTECTED] wrote:
> >> $ sa-learn --dump data | grep -i viagra:
> >> 0.049 0 1 1078173704 ViagrAa!
> >
> >Just to be sure... the above line means that the word "ViagrAa!"
> >was in zero ham and one spam mails that were sa-learned. Am
> >I reading the columns correctly?
> >
> >If so, we've got tons of mis-learned variants on viagra :-(
>
> Nah, it's the other way around. "ViagrAa!" is a ham indicator for
> LuKreme.
Oops, yeah, that's what I meant. Although the sa-learn man page
to implies that the first number is ham (under the "bayes_toks"
section.) Is that just an error in the man page?
And our results anyway seem to indicate that auto-learn is a dangerous
thing to do. The docs mostly make it sound like a good thing, but
even at 0.1 or 0.0 (non-bayes), it seems like a lot of spam falls
below the threshold and gets learned as ham. Arg.
-glenn