[issue2124] xml.sax and xml.dom fetch DTDs by default

2013-02-27 Thread Raynard Sandwick
Raynard Sandwick added the comment: I have opened issue #17318 to try to specify the problem better. While I do think that catalogs are the correct fix for the validation use case (and thus would like to see something more out-of-the-box in that vein), the real trouble is that users are often

[issue2124] xml.sax and xml.dom fetch DTDs by default

2012-01-13 Thread Brian Visel
Brian Visel aeon.descrip...@gmail.com added the comment: ..still an issue. -- nosy: +Brian.Visel ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2124 ___

[issue2124] xml.sax and xml.dom fetch DTDs by default

2012-01-13 Thread Martin v . Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: And my position still remains the same: this is not a bug. Applications affected by this need to use the APIs that are in place precisely to deal with this issue. So I propose to close this report as invalid. --

[issue2124] xml.sax and xml.dom fetch DTDs by default

2012-01-13 Thread Brian Visel
Brian Visel aeon.descrip...@gmail.com added the comment: Of course, you can do as you like. http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/ -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2124

[issue2124] xml.sax and xml.dom fetch DTDs by default

2012-01-13 Thread Martin v . Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: Well, the issue is clearly underspecified, and different people read different things into it. I take your citation of the W3C blog entry that you are asking that caching should be employed. I read the issue entirely different, namely that

[issue2124] xml.sax and xml.dom fetch DTDs by default

2012-01-13 Thread Paul Boddie
Paul Boddie p...@boddie.org.uk added the comment: Note that Python 3 provided a good opportunity for doing the minimal amount of work here - just stop things from accessing remote DTDs - but I imagine that even elementary standard library improvements of this kind weren't made (let alone the

[issue2124] xml.sax and xml.dom fetch DTDs by default

2010-11-12 Thread A.M. Kuchling
Changes by A.M. Kuchling li...@amk.ca: -- assignee: akuchling - ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2124 ___ ___ Python-bugs-list

[issue2124] xml.sax and xml.dom fetch DTDs by default

2010-07-20 Thread Mark Lawrence
Mark Lawrence breamore...@yahoo.co.uk added the comment: Does anybody know if users are still experiencing problems with this issue? -- nosy: +BreamoreBoy versions: +Python 2.7, Python 3.1, Python 3.2 -Python 2.6 ___ Python tracker

[issue2124] xml.sax and xml.dom fetch DTDs by default

2010-07-20 Thread Jean-Paul Calderone
Jean-Paul Calderone exar...@twistedmatrix.com added the comment: Yes. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2124 ___ ___

[issue2124] xml.sax and xml.dom fetch DTDs by default

2009-02-03 Thread Damien Neil
Damien Neil ne...@misago.org added the comment: I just ran into this problem. I was very surprised to realize that every time the code I was working on parsed a docbook file, it generated several HTTP requests to oasis-open.org to fetch the docbook DTDs. I attempted to fix the issue by adding

[issue2124] xml.sax and xml.dom fetch DTDs by default

2009-02-03 Thread Jean-Paul Calderone
Changes by Jean-Paul Calderone exar...@divmod.com: -- nosy: +exarkun ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2124 ___ ___ Python-bugs-list

[issue2124] xml.sax and xml.dom fetch DTDs by default

2009-02-03 Thread Jean-Paul Calderone
Jean-Paul Calderone exar...@divmod.com added the comment: Though it's inconvenient to do so, you can arrange to have the locator available from the entity resolver. The content handler's setDocumentLocator method will be called early on with the locator object. So you can give your entity

[issue2124] xml.sax and xml.dom fetch DTDs by default

2009-02-03 Thread Martin v. Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: EntityResolver.resolveEntity() is called with the publicId and systemId as arguments. It does not receive a locator. Sure. But ContentHandler.setDocumentLocator receives it, and you are supposed to store it for the entire parse, to always

[issue2124] xml.sax and xml.dom fetch DTDs by default

2009-02-03 Thread Damien Neil
Damien Neil ne...@misago.org added the comment: On Feb 3, 2009, at 1:42 PM, Martin v. Löwis wrote: Sure. But ContentHandler.setDocumentLocator receives it, and you are supposed to store it for the entire parse, to always know what entity is being processed if you want to. Where in the

[issue2124] xml.sax and xml.dom fetch DTDs by default

2009-02-03 Thread Damien Neil
Damien Neil ne...@misago.org added the comment: I just discovered another really fun wrinkle in this. Let's say I want to have my entity resolver return a reference to my local copy of a DTD. I write: source = xml.sax.InputSource() source.setPublicId(publicId)

[issue2124] xml.sax and xml.dom fetch DTDs by default

2009-02-03 Thread Jean-Paul Calderone
Jean-Paul Calderone exar...@divmod.com added the comment: It's indeed possible to provide that as a third-party module; one would have to implement an EntityResolver, and applications would have to use it. If there was a need for such a thing, somebody would have done it years ago. I don't

[issue2124] xml.sax and xml.dom fetch DTDs by default

2009-02-03 Thread Martin v. Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: Where in the following sequence am I supposed to receive the document locator? parser = xml.sax.make_parser() parser.setEntityResolver(CachingEntityResolver()) doc = xml.dom.minidom.parse('file.xml', parser) This is DOM parsing, not

[issue2124] xml.sax and xml.dom fetch DTDs by default

2009-02-03 Thread Damien Neil
Damien Neil ne...@misago.org added the comment: On Feb 3, 2009, at 11:23 AM, Martin v. Löwis wrote: I don't think this is actually the case. Did you try calling getSystemId on the locator? EntityResolver.resolveEntity() is called with the publicId and systemId as arguments. It does not

[issue2124] xml.sax and xml.dom fetch DTDs by default

2009-02-03 Thread Martin v. Löwis
Martin v. Löwis mar...@v.loewis.de added the comment: The EntityResolver's resolveEntity() method is not, however, passed the base path to resolve the relative systemId from. This makes it impossible to properly implement a parser which caches fetched DTDs. I don't think this is actually

[issue2124] xml.sax and xml.dom fetch DTDs by default

2009-02-03 Thread Damien Neil
Damien Neil ne...@misago.org added the comment: On Feb 3, 2009, at 3:12 PM, Martin v. Löwis wrote: This is DOM parsing, not SAX parsing. 1) The title of this ticket begins with xml.sax and xml.dom 2) I am creating a SAX parser and passing it to xml.dom, which uses it. So break layers of

[issue2124] xml.sax and xml.dom fetch DTDs by default

2008-02-23 Thread A.M. Kuchling
A.M. Kuchling added the comment: The solution of adding caching, If-Modified-Since, etc. is a good one, but I quail in fear at the prospect of expanding the saxutils resolver into a fully caching HTML agent that uses a cache across processes. We should really be encouraging people to use more

[issue2124] xml.sax and xml.dom fetch DTDs by default

2008-02-23 Thread Martin v. Löwis
Martin v. Löwis added the comment: I may have lost track somewhere: what does have urllib* to do with this issue? __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2124 __ ___

[issue2124] xml.sax and xml.dom fetch DTDs by default

2008-02-17 Thread Virgil Dupras
Virgil Dupras added the comment: -1 on the systematic warnings too, but what I was talking about is a warning that would say The server you are trying to fetch your resource from is refusing the connection. Don't cha think you misbehave? only on 5xx and 4xx responses, not on every remote

[issue2124] xml.sax and xml.dom fetch DTDs by default

2008-02-17 Thread ajaksu
ajaksu added the comment: Martin, I agree that simply not resolving DTDs is an unreasonable request (and said so in the blog post). But IMHO there are lots of possible optimizations, and the most valuable would be those darn easy for newcomers to understand and use. In Python, a winning combo

[issue2124] xml.sax and xml.dom fetch DTDs by default

2008-02-16 Thread Virgil Dupras
Virgil Dupras added the comment: The blog page talked about 503 responses. What about issuing a warning on these responses? Maybe it would be enough to make developers aware of the problem? Or what about in-memory caching of the DTDs? Sure, it wouldn't be as good as a catalog or anything,

[issue2124] xml.sax and xml.dom fetch DTDs by default

2008-02-16 Thread Paul Boddie
Paul Boddie added the comment: (Andrew, thanks for making a bug, and apologies for not reporting this in a timely fashion.) Although an in-memory caching solution might seem to be sufficient, if one considers things like CGI programs, it's clear that such programs aren't going to benefit from

[issue2124] xml.sax and xml.dom fetch DTDs by default

2008-02-16 Thread A.M. Kuchling
A.M. Kuchling added the comment: What if we just tried to make the remote accesses apparent to the user, by making a warning.warn() call in the default implementation that was deactivated by a setFeature() call. With a warning, code will continue to run but the user will at least be aware

[issue2124] xml.sax and xml.dom fetch DTDs by default

2008-02-16 Thread Martin v. Löwis
Martin v. Löwis added the comment: -1 on issuing a warning. I really cannot see much of a problem in this entire issue. XML was designed to be straightforwardly usable over the Internet (XML rec., section 1.1), and this issue is a direct consequence of that design decision. You might just as

[issue2124] xml.sax and xml.dom fetch DTDs by default

2008-02-15 Thread A.M. Kuchling
New submission from A.M. Kuchling: The W3C posted an item at http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic describing how their DTDs are being fetched up to 130M times per day. The Python parsers are part of the problem, as noted by Paul Boddie on the python-advocacy

[issue2124] xml.sax and xml.dom fetch DTDs by default

2008-02-15 Thread A.M. Kuchling
Changes by A.M. Kuchling: -- type: - resource usage __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2124 __ ___ Python-bugs-list mailing list Unsubscribe:

[issue2124] xml.sax and xml.dom fetch DTDs by default

2008-02-15 Thread A.M. Kuchling
A.M. Kuchling added the comment: Here's a simple test to demonstrate the problem: from xml.sax import make_parser from xml.sax.saxutils import prepare_input_source parser = make_parser() inp = prepare_input_source('file:file.xhtml') parser.parse(inp) file.xhtml contains: ?xml version=1.0

[issue2124] xml.sax and xml.dom fetch DTDs by default

2008-02-15 Thread A.M. Kuchling
Changes by A.M. Kuchling: -- priority: - urgent __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2124 __ ___ Python-bugs-list mailing list Unsubscribe:

[issue2124] xml.sax and xml.dom fetch DTDs by default

2008-02-15 Thread Martin v. Löwis
Martin v. Löwis added the comment: On systems that support catalogs, the parsers should be changed to support public identifiers, using local copies of these DTDs. However, I see really no way how the library could avoid resolving the DTDs altogether. The blog is WRONG in claiming that the