Marco:


>> Furthermore, Viranga's context appears to be XML, in which
>> case it *is* possible to encode *all* Unicode code points
>> using EUC (or ISO-8859-1 or ASCII or ...)
>
>Yes, yes. XML documents can represent characters in at least two ways:

>2) By representing them with numeric references in the form "Ӓ" etc...

>In the context of Unicode and, more generally, plain-text encoding "to
>encode" means only point 1 above, and "&1234;" is just a six-character
>string. BTW, this is also the interpretation of tools (text editor, etc.)
>used to manipulate XML files -- so it is not a pointless distinction for
>someone working in XML.
>
>Point 2, in Unicode speech, is defined a "higher level protocol",



I agree with you earlier, but on the other hand, suppose we define UTF-NCR8:

Unicode       bit             code      code      code      code      code
scalar value  pattern         unit 1    unit 2    unit 3    unit 4    unit 5

0020 - 0027   00wwwwww        00100110  00100011  00110011  0011xxxx  00111011
              where xxxx = wwwwww - 11101 (binary)

0028 - 0031   00wwwwww        00100110  00100011  00110100  0011xxxx  00111011
              where xxxx = wwwwww - 100111 (binary)

0032 - 003b   00wwwwww        00100110  00100011  00110101  0011xxxx  00111011
              where xxxx = wwwwww - 110001 (binary)

etc., but with a handful of exceptions, such as

U+0026:                       00100110  01100001  01101101  01110000  00111011

U+003C:                       00100110  01101100  01110100  00111011



We can also define UTF-NCR16 in just the same way, but the code units are 16-bit, zero-extended equivalents of the UTF-NCR8 code unites. One of the interesting aspects of these encodings is that XML parsers understand them without requiring that the charset be declared, just like UTF-8 and UTF-16.

Now, if someone interpreted Misha to mean one of these encodings, then he would be talking about encoding in the same sense as you. :-)



Peter

Reply via email to