On 06/10/2013 10:18 AM, Tom Lane wrote:
Andrew Dunstan <and...@dunslane.net> writes:
After thinking about this some more I have come to the conclusion that
we should only do any de-escaping of \uxxxx sequences, whether or not
they are for BMP characters, when the server encoding is utf8. For any
other encoding, which is already a violation of the JSON standard
anyway, and should be avoided if you're dealing with JSON, we should
just pass them through even in text output. This will be a simple and
very localized fix.
Hmm.  I'm not sure that users will like this definition --- it will seem
pretty arbitrary to them that conversion of \u sequences happens in some
databases and not others.

Then what should we do when there is no matching codepoint in the database encoding? First we'll have to delay the evaluation so it's not done over-eagerly, and then we'll have to try the conversion and throw an error if it doesn't work. The second part is what's happening now, but the delayed evaluation is not.

Or we could abandon the conversion altogether, but that doesn't seem very friendly either. I suspect the biggest case for people to use these sequences is where the database is UTF8 but the client encoding is not.

Frankly, if you want to use Unicode escapes, you should really be using a UTF8 encoded database if at all possible.



We'll still have to deal with this issue when we get to binary storage
of JSON, but that's not something we need to confront today.
Well, if we have to break backwards compatibility when we try to do
binary storage, we're not going to be happy either.  So I think we'd
better have a plan in mind for what will happen then.

                        

I don't see any reason why we couldn't store the JSON strings with the Unicode escape sequences intact in the binary format. What the binary format buys us is that it has decomposed the JSON into a tree structure, so instead of parsing the JSON we can just walk the tree, but the leaf nodes of the tree are still (in the case of the nodes under discussion) text-like objects.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to