Heikki Linnakangas <hlinnakan...@vmware.com> writes: > On 05/16/2014 06:05 PM, Tom Lane wrote: >> I think this probably means we need to change chr() to reject code points >> above 10ffff. Should we back-patch that, or just do it in HEAD?
> +1 for back-patching. A value that cannot be restored is bad, and I > can't imagine any legitimate use case for producing a Unicode character > larger than U+10FFFF with chr(x), when the rest of the system doesn't > handle it. Fully supporting such values might be useful, but that's a > different story. Well, AFAICT "the rest of the system" does handle any code point up to U+1FFFFF. It's only pg_utf8_islegal that's being picky. So another possible answer is to weaken the check in pg_utf8_islegal. However, that could create interoperability concerns with other software, and as you say the use-case for larger values seems pretty thin. Actually, after re-reading the spec there's more to it than this: chr() will allow creating utf8 sequences that correspond to the surrogate-pair codes, which are expressly disallowed in UTF8 by the RFCs. Maybe we should apply pg_utf8_islegal to the result string rather than duplicating its checks? BTW, there are various places that have comments or ifdefd-out code anticipating possible future support of 5- or 6-byte UTF8 sequences, which were specified in RFC2279 but then rescinded by RFC3629. I guess as a matter of cleanup we should think about removing that stuff. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers