On Thu, Apr 16, 2009 at 08:48:58PM +0300, Marko Kreen wrote: > Seems I'm bad at communicating in english,
I hope you're not saying this because of my misunderstandings! > so here is C variant of > my proposal to bring \u escaping into extended strings. Reasons: > > - More people are familiar with \u escaping, as it's standard > in Java/C#/Python, probably more.. > - U& strings will not work when stdstr=off. > > Syntax: > > \uXXXX - 16-bit value > \UXXXXXXXX - 32-bit value > > Additionally, both \u and \U can be used to specify UTF-16 surrogate > pairs to encode characters with value > 0xFFFF. This is exact behaviour > used by Java/C#/Python. (except that Java does not have \U) Are you sure that this handling of surrogates is correct? The best answer I've managed to find on the Unicode consortium's site is: http://unicode.org/faq/utf_bom.html#utf16-7 it says: They are invalid in interchange, but may be freely used internal to an implementation. I think this means they consider the handling of them you noted above, in other languages, to be an error. -- Sam http://samason.me.uk/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers