On 12/1/05, Kevin Krammer <[EMAIL PROTECTED]> wrote: > Isn't an XML file considered to be in ASCII unless a different enconding is > specified by the processing instruction?
Not really. Unless other information is given, AFAIK an XML file is to be assumed to be in UTF-8. Quote from http://www.w3.org/TR/REC-xml/#charencoding : "In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is a fatal error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8. Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly need an encoding declaration." As a consequence, a file containing only ASCII characters but no encoding information would be valid XML. But *assuming* that any file without encoding information will be valid ASCII is plain wrong. Valid ASCII is always valid UTF-8, but not necessarily the other way around. Christian _______________________________________________ xdg mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/xdg
