A.M. Kuchling added the comment: Here's a simple test to demonstrate the problem:
from xml.sax import make_parser from xml.sax.saxutils import prepare_input_source parser = make_parser() inp = prepare_input_source('file:file.xhtml') parser.parse(inp) file.xhtml contains: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" /> If you insert a debug print into saxutils.prepare_input_source, in the branch which uses urllib.urlopen(), you get the above list of inputs accessed: the XHTML 1.1 DTD, which is nicely modular and pulls in all those other files. I don't see a good way to fix this without breaking backward compatibility to some degree. The external-general-entities features defaults to 'on', which enables this fetching; we could change the default to 'off', which would save the parsing effort, but would also mean that entities like é weren't defined. If we had catalog support, we could ship the XHTML 1.1 DTDs and any other DTDs of wide usage, but we don't. __________________________________ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2124> __________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com