Re: [Spambayes] Training problem

skip Sat, 01 Sep 2007 18:51:30 -0700

    kudret> # ham trained on: 2559
    kudret> # spam trained on: 48


Looks very imbalanced.  We usually see the imbalance in the other direction
(lots of spam, few ham), but this far out of whack in either direction might
present problems.  I suggest you clear your database out completely and
start from scratch.  Train a couple hams, then a couple spams.  Rescore
everything.  Train on a couple mistakes or unsures of each type.  Rescore
the rest.

    kudret> How is that possible 2 similar token list, and one of them gets
    kudret> %45, the other is %0 ?

So many hammy tokens in the second message outweigh the few spammy tokens.
In the first message the relative number of hammy and spammy tokens is more
balanced, thus the overall score is nearer to the middle.

Skip
_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Re: [Spambayes] Training problem

Reply via email to