Tim, Thanks for the reply. I understand what you're talking about with papering over the problem.
I've included the full traceback that you get when you run the script I provided. Hopefully this will provide some information. Any ideas on how to resolve this would be great -- I'm moderately new to Python. Also, I upgraded to 1.1a2 and it's still occuring... 17:53:27 (~/src/spambayes) [EMAIL PROTECTED]> ./test.py Traceback (most recent call last): File "./test.py", line 9, in ? h.filter('do you want some viagra') File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/spambayes/hammie.py", line 155, in filter debug, train) File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/spambayes/hammie.py", line 109, in score_and_filter prob, clues = self._scoremsg(msg, True) File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/spambayes/hammie.py", line 38, in _scoremsg return self.bayes.spamprob(tokenize(msg), evidence) File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/spambayes/classifier.py", line 196, in chi2_spamprob clues = self._getclues(wordstream) File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/spambayes/classifier.py", line 499, in _getclues tup = self._worddistanceget(word) File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/spambayes/classifier.py", line 514, in _worddistanceget prob = self.probability(record) File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/spambayes/classifier.py", line 320, in probability prob = spamratio / (hamratio + spamratio) ZeroDivisionError: float division On 7/14/06, Tim Peters <[EMAIL PROTECTED]> wrote: > [Todd Kennedy] > > With the definitions of spamcount and hamcount it makes sense that > > they might be zero, since there is minimal training data in the > > system, and the word being scored does not exist in the database. > > > > This might be some sort of small bug with running the filter on a > > small amount of data, as I can reliably replicate a divide by zero > > error. If spamcount and hamcount are both zero, shouldn't the system > > return some sort of 0% probability for spam or ham (showing it's > > uncertainty for the phrase being scored)? > > Yes, and it does. That's what Kenny tried to tell you :-) This is > Classifier._worddistanceget(): > > def _worddistanceget(self, word): > record = self._wordinfoget(word) > if record is None: > prob = options["Classifier", "unknown_word_prob"] > else: > prob = self.probability(record) > distance = abs(prob - 0.5) > return distance, prob, word, record > > If there is no record for the word, then this returns the value of the > "unknown_word_prob" option. It only tries to _compute_ the > probability if there _is_ a record for the word, and it should never > be the case that a record exists for a word with hamcount and > spamcount both 0. > > It would be helpful to dump print statements into that function (or > run under Python's debugger) to see exactly which word it is and > what's in that record -- or possibly you'd discover that > _worddistanceget() isn't being called at all. You didn't include a > complete traceback in your original message, so it's impossible from > here to guess who called probability() to begin with. A complete > traceback would help. > > > ... > > If change line 320 of classify.py (i'm using the latest 1.1a1 release > > now) to a very simple try/except clause: > > try: > > prob = spamratio / (hamratio + spamratio) > > except: > > prob = 0 > > > > You can't replicate the error with the above script. > > > > Is this a patch that should be submitted? > > No, because that slows down a speed-critical function to paper over a > problem that should never occur. The bug isn't that this is dividing > by 0, the bug is that probability() is being _called_ when both counts > are 0. Something, somewhere, on the path _toward_ calling > probability() is in error. > _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev