I currently have quite a big problem with minidom and special chars (for example ü) in HTML.
Let's say I have following input file:
--------------------------------------------------
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<body>
ü
</body>
</html>
--------------------------------------------------And following python script:
--------------------------------------------------
from xml.dom import minidom
if __name__ == '__main__':
doc = minidom.parse('test2.html')
f = open('test3.html','w+')
f.write(doc.toxml())
f.close()
--------------------------------------------------test3.html only has a blank line where should be the ü It is simply removed.
Any idea how I could solve this problem?
MfG, Horst -- http://mail.python.org/mailman/listinfo/python-list
