Re: getting bayes back on track after habeas mess

little 22 Mar 2004 03:50:24 -0000

>  On Sun, 21 Mar 2004 12:27:17 -0800 (PST), [EMAIL PROTECTED] wrote:
>  >>  $ sa-learn --dump data | grep -i viagra:
>  >>  0.049          0          1 1078173704  ViagrAa!
>  >
>  >Just to be sure... the above line means that the word "ViagrAa!"
>  >was in zero ham and one spam mails that were sa-learned.  Am 
>  >I reading the columns correctly?
>  >
>  >If so, we've got tons of mis-learned variants on viagra :-(
>  
>  Nah, it's the other way around.  "ViagrAa!" is a ham indicator for
>  LuKreme.


Oops, yeah, that's what I meant.  Although the sa-learn man page
to implies that the first number is ham (under the "bayes_toks"
section.)  Is that just an error in the man page?

And our results anyway seem to indicate that auto-learn is a dangerous
thing to do.  The docs mostly make it sound like a good thing, but
even at 0.1 or 0.0 (non-bayes), it seems like a lot of spam falls
below the threshold and gets learned as ham.  Arg.

                -glenn

Re: getting bayes back on track after habeas mess

Reply via email to