Bugs item #1600821, was opened at 2006-11-21 17:59
Message generated for change (Comment added) made by montanaro
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=1600821&group_id=61702

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: imapfilter
Group: 1.0.1
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Ivan Vilata i Balaguer (ivilata)
Assigned to: Tony Meyer (anadelonbrin)
Summary: Classifier UnicodeDecodeError on wrong transfer encoding

Initial Comment:
Running ``sb_imapfilter.py`` 1.0.1 seems to raise the following 
``UnicodeDecodeError`` when it comes across a mail with 7-bit content transfer 
encoding with 8-bit characters in it while classifying::

    Traceback (most recent call last):
    File "/usr/bin/sb_imapfilter.py", line 924, in ?
      run()
    File "/usr/bin/sb_imapfilter.py", line 914, in run
      imap_filter.Filter()
    File "/usr/bin/sb_imapfilter.py", line 785, in Filter
      self.unsure_folder)
    File "/usr/bin/sb_imapfilter.py", line 703, in Filter
      evidence=True)
    File "/usr/lib/python2.4/site-packages/spambayes/classifier.py", line 190, 
in chi2_spamprob
      clues = self._getclues(wordstream)
    File "/usr/lib/python2.4/site-packages/spambayes/classifier.py", line 496, 
in _getclues
      clues.sort()
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: 
ordinal not in range(128)

I'm attaching the mail which caused this.  I know it is not properly-formatted, 
but it is a legitimate mail produced by a popular MUA (Thunderbird 1.5).  Spam 
surely is worsely formatted

Someone talked about the same problem in the list: 
http://www.mail-archive.com/[EMAIL PROTECTED]/msg04543.html

----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2007-09-05 12:23

Message:
Logged In: YES 
user_id=44345
Originator: NO

Do you have a traceback?  What version of SpamBayes are you using?


----------------------------------------------------------------------

Comment By: Jes�s Cea Avi�n (jcea)
Date: 2007-09-05 09:59

Message:
Logged In: YES 
user_id=97460
Originator: NO

I'm seeing a lot (>1 per hour in my system) of current spam crashing
spambayes because they are marked as "ascii" but body is 8-bit actually.

Since my milter spam filter crashes and sendmail disables the milter
filtering for 50 seconds because the failure (my configuration, and I
wouldn't like to touch it), a lot of spam is getting thru. About 30-100
spams, everytime this bug hits.

Please, increase the priority of this bug a bit... It is hitting. Hard.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=1600821&group_id=61702
_______________________________________________
Spambayes-bugs mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-bugs

Reply via email to