Tatsuo Ishii wrote:
I don't understand whole discussion.
Why do you think that employing the Unicode code point as the chr()
argument could avoid endianness issues? Are you going to represent
Unicode code point as UCS-4? Then you have to specify the endianness
anyway. (see the UCS-4 standard for more details)
The code point is simply a number. The result of chr() will be a text
value one char (not one byte) wide, in the relevant database encoding.
U+nnnn maps to the same Unicode char and hence the same UTF8 encoding
pattern regardless of endianness. e.g. U+00a9 is the copyright symbol on
all machines. So to get this char in a UTF8 database you could call
"select chr(169)" and get back the byte pattern \xC2A9.
Or are you going to represent Unicode point as a character string such
as 'U+0259'? Then representing any encoding as a string could avoid
endianness issues anyway, and I don't see Unicode code point is any
better than others.
The argument will be a number, as now.
Also I'd like to point out all encodings has its own code point
systems as far as I know. For example, EUC-JP has its corresponding
code point systems, ASCII, JIS X 0208 and JIS X 0212. So I don't see
we can't use "code point" as chr()'s argument for othe encodings(of
course we need optional parameter specifying which character set is
Where can I find the tables that map code points (as opposed to
encodings) to characters for these others?
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend