Dieter Maurer, 10.05.2010 09:07:
Stefan Behnel wrote at 2010-5-10 08:57 +0200:
Dieter Maurer, 10.05.2010 07:50:
Peterson, Wayne wrote at 2010-5-8 23:43 -0700:
I am parsing an XML file with Python 2.6.5 minidom in Windows and it is
mostly working but minidom seems to have problems dealing with Windows
cr/lf characters. It creates an extra textnode that needs to be ignored
instead of just returning the xml elements. I have tried different
methods of opening the file but it doesn't seem to make a difference. It
is happiest when reading a file in Unix format.

The parser should not see these "cr/lf" characters at all.

Python strings itself use only "\n" (aka "lf") to delimite lines.
The "\r" (aka "cr") should only be introduced when those lines
are written to text files. And they should be removed when
those line are read in again.

Are you sure that you access your files as "text" files?

The correct way to parse XML files is as binary data.

Why do you think so?

The default "minidom" parser seems not to expect "\r\n" line endings....

Interesting. Then this might really be a bug. There was a change in Python 2.6.5 that broke universal newline handling for the codecs module, this might hit here.

However, according to what the OP described, the cr/lf characters turn up correctly now, so ISTM that it's the plain '\n' line ending that needs fixing.

Stefan
_______________________________________________
XML-SIG maillist  -  XML-SIG@python.org
http://mail.python.org/mailman/listinfo/xml-sig

Reply via email to