Sam Ruby wrote: > If we can agree on the behavior, I would be glad to write up a patch. > > It seems to me that the simplest way to proceed would be for the code > that attempts to resolve character references (both named and numeric) > in attributes to be isolated in a single method. Subclasses that desire > different behavior (including the existing Python 2.4 and prior > behaviour) could simply override this method.
In SGML, this is problematic: The named things are not character references, they are entity references, and it isn't necessarily the case that they expand to a character. For example, &author; might expand to "Martin v. Löwis", and &logo; might refer to a bitmap image which is unparsed. That said, providing a overridable replacement function sounds like the right approach. To keep with tradition, I would still distinguish between character references and entity references, i.e. providing two overridable functions instead. Returning None could mean that no replacement is available. As for default implementations, I think they should do what currently happens: entity references are replaced according to entitydefs, character references are replaced to bytes if they are smaller than 256. Contrary to what others said, it appears that SGML *does* support hexadecimal character references, provided that the SGML declaraction contains the HCRO definition (which, for HTML and XML, is defined as HCRO "&#x"). So it seems safe to process hex character references by default (although it isn't safe to assume Unicode, IMO). Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com