Public bug reported:

I HAD TO CHECK "I don't know" the package. It couldn't find python2.5.
Strange.

The bug is described here:
http://mail.python.org/pipermail/python-bugs-list/2007-February/037082.html

John Nagle have explained and solved the bug:

Found the problem. In sgmllib.py for Python 2.5, in convert_charref, the
code for handling character escapes assumes that ASCII characters have
values up to 255.

But the correct limit is 127, of course.

If a Unicode string is run through SGMLparser, and that string has a
character in an attribute with a value between 128 and 255, which is valid
in Unicode, the value is passed through as a character with "chr", creating a
one-character invalid ASCII string.  

Then, when the bad string is later converted to Unicode as the output is
assembled, the UnicodeDecodeError exception is raised. 

So the fix is to change 255 to 127 in convert_charref in sgmllib.py,
as shown below.  This forces characters above 127 to be expressed with
escape sequences.  Please patch accordingly.  Thanks.

def convert_charref(self, name):
    """Convert character reference, may be overridden."""
    try:
        n = int(name)
    except ValueError:
        return
    if not 0 <= n <= 127 : # ASCII ends at 127, not 255
        return
    return self.convert_codepoint(n)

** Affects: ubuntu
     Importance: Undecided
         Status: New

-- 
Python2.5 Unicode-bug when using sgmllib.py:  UnicodeDecodeError
https://bugs.launchpad.net/bugs/240929
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to