Am 20.02.2013 23:45, schrieb R. David Murray:
> I don't believe it does.  The DTD URL is, if I remember correctly,
> specified as an identifier.  The fact that you can often also download the
> DTD from the location specified by the identifier is a secondary effect.
> 
> But, it's been a *long* time since I looked at XML :)

A DTD may have an identifier and a resource locator (local file or URL).
It depends which kind of DTD is used (internal, external public or
external system), e.g. <!DOCTYPE name PUBLIC "identifier" "url/file">.
For external DTDs a parser may chose to cache a DTD or map DTD
identifier to its own set of DTDs.

As far as I know a parser doesn't have to download a DTD unless it runs
in validation mode. Just xml.sax and xml.dom.pulldom download DTDs, see
https://pypi.python.org/pypi/defusedxml#python-xml-libraries

DTD retrieval is not as severe as external entity expansion. With
external entities like <!ENTITY passwd SYSTEM "file:///etc/passwd"> an
attacker is actually able to download files and circumvent firewalls if
the application returns parts of the XML file back.

Most XML parsers expand entities and lots of them even expand external
entities. Daniel Veillard (libxml2) has explained that entity expansion
is required for XPath() and IIRC for features like XSL, too.

Nowadays most XML parsers and libraries have options to disable certain
features. Python's standard library doesn't have options for some
features or ignores other settings silently.

Everything is documented at https://pypi.python.org/pypi/defusedxml, too.

Christian

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to