> I don't know. Is an XML document ill-formed if it doesn't contain an > XML declaration, is not in UTF-8 or UTF-8, but there's external > encoding info?
If there is external encoding info, matching the actual encoding, it would be well-formed. Of course, preserving that information would be up to the application. > This looks good. Now we would have to extent the code to detect and > replace the encoding in the XML declaration too. I'm still opposed to making this a codec. Right - for a pure Python solution, the processing of the XML declaration would still need to be implemented. >> I think there could be a much simpler routine to have the same >> effect. - if it's less than 4 bytes, answer "need more data". > > Can there be an XML document that is less then 4 bytes? I guess not. No, the smallest document has exactly 4 characters (e.g. "<f/>"). However, external entities may be smaller, such as "x". > But anyway: would a Python implementation of these two functions > (detect_encoding()/fix_encoding()) be accepted? I could agree to a Python implementation of this algorithm as long as it's not packaged as a codec. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com