Re: [HACKERS] JSON and unicode surrogate pairs

Andrew Dunstan Tue, 11 Jun 2013 06:43:59 -0700


On 06/11/2013 09:16 AM, Hannu Krosing wrote:


It's a pity that we don't have a non-error producing conversion function
(or if we do that I haven't found it). Then we might adopt a rule for
processing
unicode escapes that said "convert unicode escapes to the database
encoding

only when extracting JSON keys or values to text makes it sense to unescape
to database encoding.

That's exactly the scenario we are talking about. When emitting JSON thefunctions have always emitted unicode escapes as they are in the text,and will continue to do so.


strings inside JSON itself are by definition utf8

We have deliberately extended that to allow JSON strings to be in anydatabase server encoding. That was argued back in the 9.2 timeframe andI am not interested in re-litigating it.

The only issue at hand is how to handle unicode escapes (which in theirstring form are pure ASCII) when emitting text strings.

if possible, and if not then emit them unchanged." which might be a
reasonable
compromise.

I'd opt for "... and if not then emit them quoted". The default should
be not loosing
any data.

I don't know what this means at all. Quoted how? Let's say I have aLatin1 database and have the following JSON string: "\u20AC2.00". In aUTF8 database the text representation of this is €2.00 - what are yousaying it should be in the Latin1 database?


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON and unicode surrogate pairs

Reply via email to