Re: Getting Unicode decode error using lxml.iterparse

Stefan Behnel Wed, 23 May 2018 10:28:13 -0700

dieter schrieb am 23.05.2018 um 08:25:
> If the encoding is not specified, "lxml" will try to determine it
> and finally defaults to "utf-8" (which seems to be the correct encoding
> for your case).


Being an XML parser, it does not do that. XML parsers are designed to
reject non-wellformed content, and that includes anything that cannot be
decoded.

In short, if no encoding is specified, then it's UTF-8, but if there is an
XML declaration that specifies that encoding, then it uses that encoding.

Here, the encoding is specifed as UTF-8, so that's what the parser uses.

Note, however, that the library that the OP uses is not lxml but xml.etree,
i.e. the ElementTree XML support in the standard library.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Getting Unicode decode error using lxml.iterparse

Reply via email to