Re: [HACKERS] Bug in UTF8-Validation Code?

Mark Dilger Mon, 02 Apr 2007 16:10:47 -0700

Mark Dilger wrote:

Tom Lane wrote:
Mark Dilger <[EMAIL PROTECTED]> writes:
pgsql=# select chr(14989485);
chr
-----
ä¸
(1 row)
Is there a principled rationale for this particular behavior as
opposed to any other?

In particular, in UTF8 land I'd have expected the argument of chr()
to be interpreted as a Unicode code point, not as actual UTF8 bytes
with a randomly-chosen endianness.

Not sure what to do in other multibyte encodings.
"Not sure what to do in other multibyte encodings" was pretty much myrationale for this particular behavior. I standardized on network byteorder because there are only two endianesses to choose from, and theother seems to be a more surprising choice.
I looked around on the web for a standard for how to convert an integerinto a valid multibyte character and didn't find anything. Andrew,Supernews has said upthread that chr() is clearly wrong and needs to befixed. If so, we need some clear definition what "fixed" means.
Any suggestions?

mark

Another issue to consider when thinking about the corect definition of chr() isthat ascii(chr(X)) = X. This gets weird if X is greater than 255. If nothingelse, the name "ascii" is no longer appropriate.


mark

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Bug in UTF8-Validation Code?

Reply via email to