Am 20.02.2013 23:45, schrieb R. David Murray: > I don't believe it does. The DTD URL is, if I remember correctly, > specified as an identifier. The fact that you can often also download the > DTD from the location specified by the identifier is a secondary effect. > > But, it's been a *long* time since I looked at XML :)
A DTD may have an identifier and a resource locator (local file or URL). It depends which kind of DTD is used (internal, external public or external system), e.g. <!DOCTYPE name PUBLIC "identifier" "url/file">. For external DTDs a parser may chose to cache a DTD or map DTD identifier to its own set of DTDs. As far as I know a parser doesn't have to download a DTD unless it runs in validation mode. Just xml.sax and xml.dom.pulldom download DTDs, see https://pypi.python.org/pypi/defusedxml#python-xml-libraries DTD retrieval is not as severe as external entity expansion. With external entities like <!ENTITY passwd SYSTEM "file:///etc/passwd"> an attacker is actually able to download files and circumvent firewalls if the application returns parts of the XML file back. Most XML parsers expand entities and lots of them even expand external entities. Daniel Veillard (libxml2) has explained that entity expansion is required for XPath() and IIRC for features like XSL, too. Nowadays most XML parsers and libraries have options to disable certain features. Python's standard library doesn't have options for some features or ignores other settings silently. Everything is documented at https://pypi.python.org/pypi/defusedxml, too. Christian _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com