Before looking for a bug, create a test case and verify that the behavior isn't expected for it.
I mean, of *course* there'll be an attempt to fetch whatever DTD is mentioned in a DOCTYPE when your XML processor is validating, and it's quite reasonable to fetch one even when not validating, because there's more info in a DTD than just what's needed for validation. AFAICT, the main problem the W3C is talking about is not what happens when a legitimate DTD request occurs in response to a system ID in a DOCTYPE, but rather when there really shouldn't be such a request -- that is, when the DTD's URL is just a namespace ID. What evidence is there that Python's standard XML libs are making illegitimate requests for namespace IDs? I see none in that W3C blog post. Show us a reproducible example of a namespace ID being subjected to a fetch attempt while reading in an XML document with standard Python APIs. I don't think it's happening at all. Apparently there *is* evidence that urllib is ultimately called by something quite often to grab XHTML DTDs, and the HTTP response may not always be handled very well. But assuming it's part of normal XML processing, we have no details about whether it's a legitimate call for a DOCTYPE or an illegit one for a namespace ID, and whether it's really unreasonable to keep trying to fetch every time the reference is encountered. It sounds like application-level issues, not misbehavior by Python's SAX or DOM APIs. That blog author also seems to feel it's unreasonable for an app to seek out the same network-bound resource repeatedly, which is a sound position in some document and application contexts, but not others; it really depends on the situation, doesn't it? Sure, an app developer might be able to configure the parser to not read external entities, or could cache responses to minimize that traffic, if necessary, but it's not an obligation or necessarily a bug if that doesn't happen. And the XML spec is silent on the issue of unfetchable external entities anyway. To answer your question, legitimate DTD processing is probably a feature of the underlying parser (Expat). I assume it calls back to a urllib-based resolver. But like I said, there's no bug there; just a lack of features to encourage application developers to use XML catalogs. I don't know if this helps.. or am I missing something here? Guido van Rossum wrote: > [+xml-sig] > > On Feb 8, 2008 8:03 PM, Keith Dart ? <[EMAIL PROTECTED]> wrote: > > > > http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic > > > > This is interesting. I've noticed that when you use Python's XML > > package in validating mode it does try to fetch the DTD. Be careful > > when you use that. > > I think this is worth filing a bug, but I'd like to understand better > where the call is made. I can't find any places in the standard xml > package that does this -- but I'm not all that familiar with the code. > Do you know if it's in the base xml package, or in etree, or in the > separately distributed "XMLplus"? Any details you have would be > appreciated (like a traceback from the point where the call is made). > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig