on Thu May 10 2007, skip-AT-pobox.com wrote: > Dave> There really is something very fishy going on. I actually added > Dave> instrumentation code to watch my training script train particular > Dave> words multiple times as ham or spam, but when I query those words > Dave> using the sb_imapfilter web interface, they always are shown as > Dave> having been trained 0 or 1 times, with one of two corresponding > Dave> probabilities. > > Dave> I do a wildcard query with a single letter and returning 1000 > Dave> results, and there's not a single number over 1 in the #spam or > Dave> #ham columns. > > Dave> What could be going on? > > I've no idea. It seems to be working for me. I have lots of singletons(*), > which is to be expected, but also lots of multiples:
OK, a couple of questions: 1. what kind of database are you using? Maybe this is something in the DBM handling? 2. have you tried my patchset yet? I'd like to know if it's somehow a bug I introduced. > (*) Linguists call such singletons "hapax legemona". I guess they were > trying to be snooty when they came up with that term. Oh, they weren't just _trying_ ;-) -- Dave Abrahams Boost Consulting http://www.boost-consulting.com Don't Miss BoostCon 2007! ==> http://www.boostcon.com _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
