I have a large amount of data in a postgresql database with the encoding of SQL_ASCII. Most recent data is UTF-8 but data from several years ago could be of some unknown other data type. Being honest with myself, I am not even sure that the most recent data is always UTF-8-- data is entered on web forms and I wouldn't be surprised if data of other encodings is slipping in.
Up to the point I have just ignored the problem-- on the web side of things everything works good enough. But now I am required to stuff this data into xml datasets and I am, of course, having problems. My preference would be to force the data into UTF-8 even if it is ultimately an incorrect encoding translation but this isn't working. The below code represents my most recent problem: import xml.dom.minidom print chr(3).encode('utf-8') dom = xml.dom.minidom.parseString( "<test>%s</test>" % chr(3).encode('utf-8') ) chr(3) is the ascii character for "end of line". I would think that trying to encode this to utf-8 would fail but it doesn't-- I don't get a failure till we get into xml land and the parser complains. My question is why doesn't encode() blow up? It seems to me that encode() shouldn't output anything that parseString() can't handle. Sorry in advanced if this post is ugly-- it is through the google groups interface and google mangles the entry sometimes. -- http://mail.python.org/mailman/listinfo/python-list