I am using spambayes with nnml/gnus and started getting this error
message when filtering incoming mail (complete traceback below)
File "/usr/lib/python2.4/site-packages/spambayes/classifier.py", line 314, in
probability
assert hamcount <= nham, "Token seen in more ham than ham trained."
AssertionError: Token seen in more ham than ham trained.
The only thing mildly nonstandard that I am doing with my database is
using a standalone script sb_classify_nnml.py below to report the
classifier results, much like classify does in the web interface. I
have this script bound to a key in my gnus summary buffer so I can get
a report on the current article. Is there a modification to the
script I can make that will prevent this problem from recurring, if
indeed the script is the cause of this problem?
Is there a way to fix my database, or otherwise avoid this error,
other than retraining?
>>> spambayes.__version__
'1.1a1+'
Thanks!
JDH
### sb_classify_nnml.py
#!/usr/bin/python
import sys, os
from spambayes.tokenizer import tokenize
from spambayes.classifier import Bayes
from spambayes.hammie import Hammie
from spambayes import storage
db = os.path.join(os.environ['HOME'], '.hammiedb')
classifier = storage.open_storage(db, 'dbm', 'r')
hammie = Hammie(classifier)
nnml, relpath = sys.argv[1].split(':')
relpath = relpath.replace('.', os.sep)
fullpath = os.path.join(os.environ['HOME'], 'Mail', relpath, sys.argv[2])
message = file(fullpath).read()
ptotal, clues = hammie.score(message, evidence=True)
print 'Probability spam', ptotal
swap = [ (p,word) for word, p in clues]
swap.sort()
swap.reverse()
for item in swap:
print ' %1.1f : %s'%item
### Traceback
Traceback (most recent call last):
File "python/examples/spambayes_hammie.py", line 13, in ?
ptotal, clues = hammie.score(message, evidence=True)
File "/usr/lib/python2.4/site-packages/spambayes/hammie.py", line 62, in
score return self._scoremsg(msg, evidence)
File "/usr/lib/python2.4/site-packages/spambayes/hammie.py", line 38, in
_scoremsg
return self.bayes.spamprob(tokenize(msg), evidence)
File "/usr/lib/python2.4/site-packages/spambayes/classifier.py", line 196, in
chi2_spamprob
clues = self._getclues(wordstream)
File "/usr/lib/python2.4/site-packages/spambayes/classifier.py", line 499, in
_getclues
tup = self._worddistanceget(word)
File "/usr/lib/python2.4/site-packages/spambayes/classifier.py", line 514, in
_worddistanceget
prob = self.probability(record)
File "/usr/lib/python2.4/site-packages/spambayes/classifier.py", line 314, in
probability
assert hamcount <= nham, "Token seen in more ham than ham trained."
AssertionError: Token seen in more ham than ham trained.
_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html