Gmane isn't updating so I can't really reply to the message (not visible in gmane) that I want to, but I saw the following solution proposed:

def ourparse(text):
   if isinstance(text, unicode):
      text = text.encode('UTF-8')

now consider what will happen if you do the following:

text = u"<?xml version="1.0" encoding="ISO-8859-1" ?><foo>Some non-ascii characters here</foo>"

what will happen is that text is converted to a UTF-8 string (8-bit ascii). It's then passed to a hopefully compliant XML parser. This XML parser sees an 8-bit ascii string, and checks the encoding header for more information on the encoding of the string. It will therefore assume the string is in latin-1. The parse will break with an obscure error and the developer doing this is probably very confused.

This is why it's better to refuse to guess.



