Barak, Ron, 04.05.2010 16:11:
I'm parsing XML files using ElementTree from xml.etree (see code
below (and attached xml_parse_example.py)).
However, I'm coming across input XML files (attached an example:
tmp.xml) which include invalid characters, that produce the
following traceback:
$ python xml_parse_example.py
Traceback (most recent call last):
xml.parsers.expat.ExpatError: not well-formed (invalid
token): line 6, column 34
I hope you are aware that this means that the input you are
parsing is not XML. It's best to reject the file and tell the
producers that they are writing broken output files. You
should always fix the source, instead of trying to make sense
out of broken input in fragile ways.
The XML file seems to be valid XML (all XML viewers I tried were able to read
it).
This is what xmllint gives me:
-----------------------
$ xmllint /home/sbehnel/tmp.xml
tmp.xml:6: parser error : Char 0x0 out of allowed range
<m_sanApiName1>"MainStorage_snap
^
tmp.xml:6: parser error : Premature end of data in tag m_sanApiName1 line 6
<m_sanApiName1>"MainStorage_snap
^
tmp.xml:6: parser error : Premature end of data in tag DbHbaGroup line 5
<m_sanApiName1>"MainStorage_snap
^
tmp.xml:6: parser error : Premature end of data in tag database line 4
<m_sanApiName1>"MainStorage_snap
^
-----------------------
The file contains 0-bytes - clearly not XML.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list