Public bug reported: I HAD TO CHECK "I don't know" the package. It couldn't find python2.5. Strange.
The bug is described here: http://mail.python.org/pipermail/python-bugs-list/2007-February/037082.html John Nagle have explained and solved the bug: Found the problem. In sgmllib.py for Python 2.5, in convert_charref, the code for handling character escapes assumes that ASCII characters have values up to 255. But the correct limit is 127, of course. If a Unicode string is run through SGMLparser, and that string has a character in an attribute with a value between 128 and 255, which is valid in Unicode, the value is passed through as a character with "chr", creating a one-character invalid ASCII string. Then, when the bad string is later converted to Unicode as the output is assembled, the UnicodeDecodeError exception is raised. So the fix is to change 255 to 127 in convert_charref in sgmllib.py, as shown below. This forces characters above 127 to be expressed with escape sequences. Please patch accordingly. Thanks. def convert_charref(self, name): """Convert character reference, may be overridden.""" try: n = int(name) except ValueError: return if not 0 <= n <= 127 : # ASCII ends at 127, not 255 return return self.convert_codepoint(n) ** Affects: ubuntu Importance: Undecided Status: New -- Python2.5 Unicode-bug when using sgmllib.py: UnicodeDecodeError https://bugs.launchpad.net/bugs/240929 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
