Re: Nicest UTF

John Cowan Fri, 10 Dec 2004 17:49:25 -0800

Marcin 'Qrczak' Kowalczyk scripsit:

> http://www.w3.org/TR/2000/REC-xml-20001006#charsets
> implies that the appropriate level for parsing XML is code points.


You are reading the XML Recommendation incorrectly.  It is not defined
in terms of codepoints (8-bit, 16-bit, or 32-bit) but in terms of
characters.  XML processors are required to process UTF-8 and UTF-16,
and may process other character encodings or not.  But the internal
model is that of characters.  Thus surrogate code points are not
allowed.

-- 
John Cowan  www.reutershealth.com  www.ccil.org/~cowan  [EMAIL PROTECTED]
Arise, you prisoners of Windows / Arise, you slaves of Redmond, Wash,
The day and hour soon are coming / When all the IT folks say "Gosh!"
It isn't from a clever lawsuit / That Windowsland will finally fall,
But thousands writing open source code / Like mice who nibble through a wall.
        --The Linux-nationale by Greg Baker

Re: Nicest UTF

Reply via email to