I currently have quite a big problem with minidom and special chars (for example ü) in HTML.
Let's say I have following input file: -------------------------------------------------- <?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <body> ü </body> </html> --------------------------------------------------
And following python script: -------------------------------------------------- from xml.dom import minidom if __name__ == '__main__': doc = minidom.parse('test2.html') f = open('test3.html','w+') f.write(doc.toxml()) f.close() --------------------------------------------------
test3.html only has a blank line where should be the ü It is simply removed.
Any idea how I could solve this problem?
MfG, Horst -- http://mail.python.org/mailman/listinfo/python-list